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Abstract 


A coloring of a graph is an assignment of colors to the vertices so that no two adjacent 
vertices are given the same color. The problem of coloring a graph with the minimum 
number of colors is well known to be NP-hard, even restricted to k-colorable graphs for 
constant k > 3. This thesis explores the approximation problem of coloring k-colorable 
graphs with as few additional colors as possible in polynomial time, focusing on the case of 
k= 3, 

For the worst-case problem, the previous best upper bound on the number of colors 
needed for coloring 3-colorable n-vertex graphs in polynomial time is O(,/n//log n) colors 
by Berger and Rompel, improving a bound of O(,/n) colors by Wigderson. We present 
an algorithm to color any 3-colorable graph with O(n3/® polylog(n)) colors, breaking an 
“O(n1/?-°(1)) barrier”. The algorithm presented here is based on examining second-order 
neighborhoods of vertices, rather than just immediate neighborhoods of vertices as in pre- 
vious approaches. We extend our results to improve the worst-case bounds for coloring 
k-colorable graphs for constant k > 3 as well. 

We also examine the problem of coloring random k-colorable graphs. We consider a 
standard model in which vertices are first randomly assigned to one of k color classes 
and then each edge between two vertices of different color is placed into the graph with 
probability p. For sufficiently high edge probability, it is known by results of Turner, Dyer 
and Frieze, and others, that such graphs are easy to k-color. We describe here an algorithm 
to k-color graphs generated in this way for a much wider range of edge probabilities (p > 
n~'*© for any constant € > 0) than previously possible. 

To study a wider variety of graph distributions, we also present a model of graphs gen- 
erated by the semi-random source of Santha and Vazirani that provides a smooth transition 
between the worst-case and random models. In this model, the graph is generated by a 
“noisy adversary” — an adversary whose decisions (whether or not to insert a particular 
edge) have some small (random) probability of being reversed. We show that even for quite 
low noise rates, semi-random k-colorable graphs can be colored with high probability using 
just & colors. 

Finally, we use assumptions about the worst-case difficulty of approximate graph col- 
oring to provide lower bounds for other hard problems. Using techniques developed by 
Berman and Schnitger, we show that if there is no polynomial-time algorithm to color 
k-colorable graphs with O(logn) colors, then the largest independent set in a graph (or 
equivalently the largest clique) cannot be approximated to within a factor of n’~‘ for any 
constant € > 0. This is a much higher lower-bound than achieved by previous results, albeit 
based on less solid assumptions. 
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Chapter 1 


Introduction 


A k-coloring of a graph is an assignment of one of k distinct colors to each vertex in the 
graph so that no two adjacent vertices are given the same color. The chromatic number of 
a graph is the smallest & such that the graph can be k-colored. 

Graph coloring problems have a long history in mathematics and computer science. 
The famous 4-Color Problem of whether every planar graph is 4-colorable, dates back at 
least to 1852 [33]. Partly through that problem, finally solved by Appel and Haken [3], 
graph coloring has become a central topic in combinatorics. In computer science, graph 
coloring problems have long been known to model various scheduling problems such as 
examination scheduling and register allocation. Graph coloring is also closely related to 
other combinatorial problems such as finding the maximum independent set in a graph (the 
largest set of vertices such that no two have an edge between them). 

Unfortunately from the algorithmic point of view, as is well known, the problem of 
determining the chromatic number of a graph is NP-Complete. The problem of deciding 
whether a graph is k-colorable for any fired k > 3 is NP-Complete as well. Thus, coloring 
an arbitrary k-colorable graph with k colors for k > 3 cannot be done in polynomial time 
unless P = NP (for k = 2, 2-coloring is easy). Knowing that the coloring problem is NP- 
hard does not make it disappear, however, and it also does not necessarily mean nothing 
useful can be done. It does mean that as for many other famous hard problems such as the 
Traveling Salesman Problem (TSP) and the Bin Packing problem, researchers attempting 


to find good fast algorithms must consider issues of approximation. 


This thesis concerns the algorithmic problem of finding good approximate colorings of 
graphs for several natural forms of approximation. We focus here on deriving polynomial- 
time algorithms for coloring graphs of constant chromatic number and on improving upon 
previously known algorithmic guarantees. In particular, we both improve upon previous 
guarantees for the number of colors needed in the worst case to properly color k-colorable 
graphs in polynomial-time, and extend the known classes of graphs for which optimal col- 
orings can be found quickly. We will not be so concerned here with precisely optimizing 


the running time of the algorithms (so long as they are polynomial); instead we focus more 
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on the quality of the approximation. Because 3-chromatic graphs are the simplest and in 
a sense the most fundamental graphs for which optimal coloring is NP-hard, much of this 
thesis will focus on the special case of coloring graphs of chromatic number 3. We then 


describe extensions of these results to graphs of higher constant chromatic number as well. 


1.1 Applications of graph coloring 


Graph coloring problems arise in situations where one would like to assign a small number 
of values to objects under pairwise constraints of the form that object z and object y cannot 
receive the same value. Such situations occur often in various scheduling problems and we 


present a few examples here. 


Example 1: Examination Scheduling. 


Consider the problem of scheduling n final exams into a small number of different 
time slots. One would like to do so in a way such that no student has a conflicting 
schedule: that is, no student has two of her examinations at the same time. Suppose 
we assign one vertex in a graph to each examination and place an edge between two 
vertices if some student is taking both corresponding exams. Then the problem of 
scheduling the examinations into k time slots so no student has a conflict is exactly 


the problem of coloring the corresponding graph with k colors [42][5]. 


Example 2: Register Allocation. 


A more “real computer science” problem, for which graph coloring techniques have 
actually been used in practice is the problem of register allocation in compilers. Work 
in this direction has been done by several researchers including Chaitin [15], Chaitin et 
al. [16], and Briggs et al. [13]. During compilation, a standard compiler [15] transforms 
the source program into an intermediate language based on a hypothetical machine 
with an unlimited number of fast syntactic (virtual) registers. Since the real machine 
has only a bounded number of registers, the compiler in a “register allocation phase” 
must then map the computed values in the syntactic registers into the true registers of 
the machine (e.g., 17 registers in work of Chaitin et al. [16] or 32 registers in work of 
Chaitin [15]). If the compiler cannot do this exactly, it will be forced to “spill” some 
values into main memory through load and store operations. Because the registers 


are fast, the hope is to spill as little of the computation as possible. 


The relationship of this problem to graph coloring is as follows. For each procedure in 
a program, Chaitin et al. build a “register interference graph” containing one vertex 


for each value (e.g. a variable in the program) and an edge between two vertices if 
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the two values interfere and cannot be placed into the same register. Interference is 
checked roughly by examining if both values are “live” at the same time, or more 
precisely if one value is live at a definition point of the other. Thus, if we think of 
the real registers as colors, an assignment of 17 or 32 colors to the vertices of the 
interference graph (depending on which machine is being used) corresponds to an 


assignment of registers to the computed values of the procedure. 


Of course, it may be that the interference graph cannot be colored with the required 
number of colors. In that case, the uncolored vertices are “spilled” into main memory. 
So, the goal here is to color as many vertices as possible with the given number of 
colors (where “many” may be defined by some additional measure of cost and not 
just sheer quantity). As it turns out, once values are spilled, this requires additional 
vertices, usually of low degree, to be added onto the graph, so the abstraction as 
a standard coloring problem is not quite exact. Nonetheless, simple graph coloring 


heuristics appear to work well in practice [15][16][13}. 


1.2 Forms of approximation, and past work 


For the graph coloring problem, the issue of approximation splits naturally into two general 
directions. One direction is to consider worst-case graphs, but allow the number of colors 


used to be non-optimal. In particular, one would like answers to the question: 


Given an n-vertex k-colorable graph, how many colors do you need in order to 


color the graph in polynomial time? 


A second general direction is to relax the restriction that the graph be worst case and 
attempt to find optimal colorings for large or nicely characterized subsets of the inputs. 


That is, one would like answers to the question: 


While coloring k-colorable graphs with k colors in the worst-case is hard, can 


you find a large subset of cases where k-coloring is easy? 


1.2.1. Approximate coloring in the worst case 


For graphs of constant chromatic number, the first nontrivial result along the first direction 
presented here was due to Wigderson [43]. Wigderson gives an algorithm to color any n- 
vertex 3-colorable graph with O(,/n) colors, and more generally to color any k-colorable 
graph with O(n!~¥*) colors. More recently, several researchers: Berger and Rompel [6], 
Linial, Saks, and Wigderson [24], and Raghavan [32] independently have improved upon 
this bound to color k-colorable graphs with O((n/ log n)i-T) colors, which for k = 3 
results in a coloring of 3-colorable graphs with O(,/n/J/log n) colors. 
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The result of Berger and Rompel, et al. was important because no progress had been 
made for some time and it showed that \/n was in no sense a lower bound for color- 
ing 3-colorable graphs. However, for the kinds of techniques used it was clear that, say, 
O(./n/ log? n) colors would be completely out of reach. The difficulty in improving these 
results motivated work of Linial and Vazirani [25] who provide some evidence for an n‘ 
lower bound for the general chromatic number approximation problem. 

For general graphs of arbitrary chromatic number, the best algorithmic result known to 
date is due to Hallddérsson [22]. Halldérsson’s algorithm has a performance guarantee—that 
is, a ratio of the number of colors used to the chromatic number—of O(n(log log n)?/ (log n)?). 
This result is based upon an algorithm by Boppana and Halldérsson [12] for the Independent 
Set problem which finds an independent set within an n/(logn)? factor of the maximum. 

There has also been recent work on coloring graphs presented in an on-line manner; that 
is, coloring graphs presented one vertex at a time in some arbitrary order. Vishwanathan [41] 
presents an algorithm for such a model that uses a number of colors within a logarithmic 


factor of the Wigderson bound. 


1.2.2 Exact coloring in special cases 


Many classical results on graph coloring can be thought of from the point of view of the 
second direction described here. These results prove nice characterizations that are sufficient 
conditions for k-colorability, and the characterizations are often testable in polynomial time. 
For example, the famous 4-Color Problem and Theorem gives an easy way to prove a graph 
to be 4-colorable — one simply checks that the graph is planar. In fact, the 4-Color Theorem 
of Appel and Haken is known to yield a polynomial-time coloring algorithm for such graphs. 
Of course, if the graph turns out not to be planar, then this technique says nothing about 
the graph’s chromatic number. For graphs of chromatic number 3, similar classical results 
are known. Groétzsch ((5], p.355) proved that any planar graph without triangles must be 
3-colorable, and this was extended to hold for graphs with at most 3 triangles by Griinbaum 
[21].1 The proofs of both results involve reducing a graph to one with fewer vertices in ways 
that yield polynomial-time coloring algorithms. For graphs of general chromatic number, 
Brooks’ Theorem [14][26] states that any connected graph of maximum degree d (d > 2) is 
either d-colorable or else is a single (d + 1)-clique. Note that it is very easy to d-color any 
graph with maximum degree d — 1: for each vertex in an arbitrary order, simply give to 
that vertex any color in {1,...,d} not held at the time by any of its neighbors. Steinberg 


[37] presents a survey of such classical results, focusing on 3-chromatic graphs. 


"In some sense this is “best possible” since the 4-clique Ky is planar with four triangles and is not 
3-colorable. See [5]. 
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Instead of presenting such explicit characterizations of easy-to-color families of graphs, 
one can also study random families of graphs. Turner [38], Kucera [23], and Dyer and 
Frieze [18] give polynomial-time algorithms that color random k-colorable graphs with k 
colors with high probability, for any constant k. So, most k-colorable graphs are easy to 
k-color. In fact, Dyer and Frieze go further and provide an algorithm that when amortized 
over all n-vertex k-colorable graphs, spends on average polynomial time per graph. Petford 
and Welsh [31] present experimental work using heuristics for coloring random 3-colorable 
graphs and claim success for a wide range of edge probabilities. 

It is not known how to color general random graphs (where we do not restrict the chro- 
matic number) in polynomial-time with the minimum of colors, but one can get fairly close. 
For the model G(n, p) of an n-vertex graph in which each edge is included with probability 
p, Bollobds [10] has shown that the chromatic number will be (1 + o(1)) n/(2log, n) with 
high probability, for 6 = re It is not hard to show that the greedy algorithm: in some 
order give to each vertex the color of least index not yet held by any of its neighbors, finds 
a coloring of at most (1+ 0(1))n/log,n colors, a factor of 2 above optimal. Matula [27] 


provides quasi-polynomial approaches with provably better bounds. 


1.3 New results and a plan of the thesis 


This thesis presents results in both of the two directions discussed above. For k-colorable 
graphs for constant k, we both provide better approximation guarantees for the worst-case 
problem and expand the classes of graphs for which optimal coloring is known to be easy. 

The major portion of this thesis concerns the the first direction discussed of finding 
improved approximation guarantees for the worst-case problem. We present an algorithm 
that uses a quite different strategy from that used by the algorithms of Wigderson and 
Berger and Rompel and others, and colors any 3-colorable graph with O(n°/8 log®/? n) colors. 
Thus, we improve the previous bound of O(./n//logn) colors and break a “soft-O(./n) 
barrier” (that is, ignoring polylogarithmic factors). The algorithm we present also extends 
to graphs of higher constant chromatic number and improves upon the previous bounds for 
such graphs. We present the new algorithm in two parts: the first part (Chapter 4) colors 
3-colorable graphs with O(n?/*+%) colors, and the second part (Chapter 5) achieves the 
better bound claimed above. The strategy used also suggests a plausible path for further 
significant reductions in the color bounds, and a discussion of this is given in Chapter 
10. The algorithms presented for the worst-case problem are motivated by techniques that 
would work if the graph were in fact chosen randomly, and this motivation and the general 
flavor of the algorithms are given in Chapter 3. 


Along the second direction, we extend the class of randomly chosen k-colorable graphs for 
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which a k-coloring can be found in polynomial time. In particular, we consider a standard 
model of a random k-colorable graph in which vertices are first randomly assigned to one of 
k color classes and then each edge between two vertices of different color is placed into the 
graph with probability p. For this model, we are able to find colorings for a wider range of 
edge probabilities (p > n~'+* for any constant € > 0) than was previously known. These 
results are described in Chapter 7. 

While the known results on random graphs imply that most k-colorable graphs are easy 
to k-color, random k-colorable graphs tend to be of a very special type. For example, with 
high probability all vertices of a random k-colorable graph have nearly the same degree and 
vertices of the same color class all have nearly the same number of common neighbors. So, 
graphs created in only a “somewhat random” manner may not be colored well by algorithms 
for the random case. To explore a wider variety of graph distributions, we present in Chapter 
8 a model of graphs created by the semi-random source of Santha and Vazirani [34] that 
provides a smooth transition between the worst-case and random models. In this model, 


“noisy adversary” — an adversary whose decisions (whether 


the graph is generated by a 
or not to insert a particular edge) have some small probability of being reversed. We show 
that even for quite low noise rates, these semi-random k-colorable graphs can be colored 
with high probability using just k colors. The discussion of random and semi-random graph 
models is based in part on work joint with Joel Spencer. 

In addition to the above-mentioned general directions, we describe in Chapter 9 how 
hardness assumptions for approximately coloring graphs in the worst case can be used to 
provide lower bounds for other hard problems. In particular, we use a technique developed 
by Berman and Schnitger [7] to prove the following result. Suppose there were a polynomial- 
time algorithm to find an independent set in a graph of size at most a factor of n'~‘ smaller 
than the size of the largest independent set, for some constant « > 0. Then one could 
convert such a procedure into one that colors k-colorable graphs with O(log 7) colors, for any 
constant k. Also, one could convert such a procedure into one that colors (log n)-colorable 
graphs with polylog(n)-colors. This contrasts with the best algorithm known to date [22] for 
coloring (log n)-colorable graphs which uses more than n/(logn)* colors. So, these results 
imply that a seemingly small improvement in approximating independent sets implies one 
can get a much larger improvement for approximate graph coloring. In contrapositive 
form, these results present a high lower-bound for Independent Set approximation based 
on a hardness assumption for graph coloring that is quite far from the best algorithmic 
guarantees currently known. 


Some of the work in this thesis has previously appeared in extended abstract form [8][9]. 


Chapter 2 


Notation, definitions, and previous algorithms 


In this chapter we review some standard graph-theoretic definitions and introduce basic 
notation that will be used throughout this thesis. At the end of the chapter we will describe 
some previous worst-case coloring algorithms in order to introduce a few useful techniques. 

Given a graph G, let V(G) denote the vertices of G and E(G) denote the edges of G. 
We will use N(v) to denote the neighborhood of a vertex v and d(v) to denote the vertex 
degree. That is, for G = (V, FE): 


e N(v) = {w EV | (v, w) € FE}, and 
© d(v) = |N(v)}. 
It will also be convenient to define the degree D(S) of a set of vertices S by: 


¢ D(S) =>) dv), 


ves 


and the neighborhood N(S) of set S$ by: 


e N(S)= U N(v) = {we V | (v,w) € E for some v € S}. 
ves 
Notice that D(.S) may be much larger than |N(S)| if vertices in S share many neighbors in 
common. We will also use the term “distance-2 neighbors” of a vertex v to mean the set 
N(N(v)). Note that if N(v) # ¢ then v € N(N(v)). 

An independent set in a graph is a set of vertices no two of which are adjacent to each 
other. A verter cover is a set W such that every edge in the graph has at least one endpoint 
in W; that is, it is a set W such that V — W is independent. 

As mentioned in the introduction, the chromatic number of a graph is the least number 
of colors needed to color the graph so that no two adjacent vertices are given the same 
color. As is standard terminology [29], we will say that a graph is k-chromatic to mean 
that the chromatic number is exactly k, and that a graph is k-colorable to mean that the 
chromatic number is at most k. For the most part, this distinction will not be important 


and we will use the terms interchangeably. We say that an algorithm t-colors a graph if it 
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colors the graph with at most ¢ colors, and it optimally colors a graph if it colors with the 
fewest number of colors possible. 

For the special case where G is a 3-colorable graph, we use red, blue, and green to denote 
the colors of vertices in G under some legal (but unknown) 3-coloring. We also use these 
terms to denote the sets of vertices belonging to each color class under that legal coloring. 

For functions f and g we say g(n) = O(f(n)) to denote that g(n) = O(f(n) log‘ n) for 
some constant c. Similarly, we will use g(n) = Q(f(n)) to denote that g(n) = Q(f(n)/ log’ n) 
for some constant c. We also use “g(n) >> f(n)” to mean that f(n) = o(g(n)). Finally, we 


use the following general standard notation: 
e (m); = m(m — 1)(m — 2)---(m—i+ 1). 
e K, is the clique on ¢ vertices. 


e For S asubset of vertices of graph G, the graph H = G|s is the subgraph of G induced 
by set S. That is, V(H) = S and E(H) = {(i,7) € E(G) | 7,7 € S}. 


The term “log n” will be used to denote log, n, and log? n will be used to denote (log n)?. 


2.1 Previous algorithms 


As is well known, 2-colorable graphs can easily be 2-colored in polynomial time. For exam- 
ple, the following procedure suffices to color any 2-colorable graph with the colors 0 and 1. 
First, assign a color, say 0, to one vertex in each connected component in the graph. Then 
assign color 1 to each neighbor of a vertex colored 0. Finally, repeat, assigning color 0 to 
any uncolored neighbor of a vertex of color 1, and color 1 to any uncolored neighbor of a 
vertex colored 0, and so on, until the entire graph is colored. The resulting coloring will be 
legal since 2-colorable graphs have no odd cycles. 

Let us now review Wigderson’s algorithm [43] for the special case of 3-colorable graphs. 
Wigderson’s algorithm looks at the immediate neighborhoods of vertices, and uses the fact 
that in a 3-colorable graph the neighborhood of any vertex is 2-colorable. The algorithm 
proceeds as follows. If there exists a vertex of degree at least ,/n in the graph, then we 
color its neighborhood with two unused colors and then delete the colored nodes from the 
graph. If all vertices have degree less than ./n, we can greedily /n-color the remaining 
graph, since with ,/n colors, for each vertex we are guaranteed that at least one color is not 
used on its neighbors. The total number of colors used is at most 3,/n. If we pick a degree 
cutoff of /2n instead of \/n, we can optimize the constant for this type of strategy to V8. 


A more formal description of the algorithm is given below. 
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Wigderson’s Algorithm 


Given G = (V, FE), a 3-colorable graph on n vertices. 


1. Initialize color c to 0. 
2. While there exists a vertex v € V with d(v) > Jn, 
(a) 2-color N(v) with colors: c,c+1. 
(b) Letc—c+2, VeV—N(v). 
(note that the loop in this step can be executed at most ,/n times.) 


3. Color the remaining graph with colors c,c + 1,...,¢ + /n — 1, by arbitrarily 


assigning to each vertex a color not held by any of its neighbors. 


The improvement to O(,/n/J/log n) of Berger et al. mentioned previously results from 
choosing (log n) starting vertices instead of one. This can be done by selecting an arbitrary 
subset of vertices of size (3 log n), and trying each subset of size (log n); one such subset must 
be monochromatic under some legal 3-coloring of G and so has a 2-colorable neighborhood. 
The way that this set is then exploited is described in [6]. We will revisit this algorithm in 
Chapter 3, where the algorithm and bounds guaranteed follow as an easy corollary of the 
machinery described there. 

In contrast to the above strategies, the new worst-case algorithm presented here is a 
multi-pronged attack. The main idea of the new approach is to take advantage of informa- 
tion from not just the immediate neighbors of vertices, but from distance-2 neighbors as 
well. One difficulty with looking at distance-2 neighbors is that they have not so obvious 
a structure as the immediate neighbors. For example, the immediate neighborhood, as 
noted earlier, is 2-colorable; the structure of the distance-2 neighbors will have to be more 


carefully brought out. 


Chapter 3 


Worst-case bounds: preliminaries 


3.1 New worst-case approach: the basic idea 


The previous best algorithms for coloring 3-colorable graphs all used O(n'/?) colors in the 
worst-case. This section describes the basic idea for an algorithm to color any n-vertex 
3-colorable graph G with O(n*) colors, for some a < 1/2. Note that to do so, it is enough, 
as in Wigderson’s algorithm, to find an independent or 2-chromatic set of size Q(n!~°), 
since that set can be colored with 1 or 2 colors and the procedure repeated on the graph 
remaining. 

The idea of the new algorithm is to try to make progress from examining distance-2 
neighbors, and not just the immediate neighborhoods of vertices as in previous algorithms. 
We will describe the motivation for the approach by considering the question: “what if 
the edges in the graph were distributed randomly?” That is, what if after an adversary 
decided which nodes to place in the sets red, blue, and green (the color classes under a legal 
3-coloring unknown to the algorithm) a coin of some bias p was then flipped for each pair 
of vertices u,v of different colors to determine whether edge (u,v) would be in the graph? 
In that case, the following strategy finds an independent set of size Q(n?/3). 

First, we may assume there are about the same number of red, blue, and green vertices, 
since otherwise we could immediately separate at least one of the color classes from the 
others by just looking at the vertex degrees.’ Second, we may assume that the vertices have 
average degree at least n/3, since otherwise we could just greedily gather an independent 
set of size Q(n/3). Finally, for simplicity, we assume that the average degree d is at most 
ni/2-€ for some € > 0 (so we have n'/3 < d < n'/?-£), This last requirement will simplify 
the motivational argument, but is not necessary. 

Suppose v is a red vertex. Then, the neighborhood of v consists of blue and green 
vertices, with approximately half of each color if the numbers of blue and green vertices 


in the graph are roughly equal. Each blue vertex in N(v) similarly has about half green 


‘Once we have separated one of the color classes from the others, we can then easily 2-color the graph 
remaining. This fact about the sizes of the color classes for random graphs does not generalize to worst-case 
graphs, and in fact, there is no analog of this step used in the worst-case algorithm. It is inserted here solely 
to simplify our picture of the graph. 
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neighbors and half red neighbors, and each green vertex has about half blue neighbors and 
half red neighbors. So, if we look at the set of the distance-2 neighbors S = N(N(v)), red 
vertices are significantly more predominant than blue or green vertices. In fact, about half 
of S is red, a quarter blue, and a quarter green, since we have assumed d is small enough (at 
most n‘/?-*) that not many vertices of S are neighbors of several vertices of N(v). Thus, 
S is a set of size at least 0(n?/9) that has within it an independent set (the red vertices) of 


about one half the size of $.? 


Given a set 5 of size 2(n?/5) containing an independent set of size 4|S|, and therefore 
a vertex cover of size 3|5'|, we can algorithmically find an independent set of size 0(n?/3) 
by applying a vertex-cover approximation algorithm due to Bar-Yehuda and Even [4] and 
(independently) to Monien and Speckenmeyer [28].? Their algorithm finds a vertex cover 
of size at most (2 — wee) times the size of the minimum vertex cover in the graph. If we 
apply the algorithm to the graph induced by S, we find a vertex cover W in S of size at most 
3|S| (2 - loglog ) which is at most |S'| — |S|/(4log|5S|). So, the complement, S — W, is an 
independent set inside S of size at least Q(|S|/log|S|) = Q(n/). Thus, in the case where 
the edges in the graph are chosen by a random process, we have found a large independent 
set. In Chapter 7, we see how in fact to do much better for random graphs and actually 
3-color random 3-colorable graphs for p > n)~1 (i.e., where the average degree is at least 


n‘ for some € > 0). 


Worst-case graphs, however, are not random. Instead, we will use various techniques 
to force the graph to have properties of random graphs, or at least weak versions of these 
properties, that we need. One such property is that of being “well-distributed”: we want 
N(N(v)), or at least an easy-to-select subset of N(N(v)), to have nearly half red vertices, 
so that the vertex-cover approximation algorithm can be used. The second such property 
is an expansion property: we want the selected subset of N(NV(v)) to be significantly larger 
than N(v), so that our performance is much better than that achieved by looking only at 


immediate neighbors. 


Chapters 4 and 5 describe one general method for proving the existence of a form of 
good distribution in worst-case graphs and two methods for forcing expansion. The first 
method for forcing expansion (described in Chapter 4) is simple and elegant and results in 
a coloring of any 3-colorable graph with O(n2/ °) colors; the second (described in Chapter 5) 


is more complicated, but results in an improved bound of O(n*/8) colors. 


?We can remove the restriction d < n'/*-* by choosing S to be a subset of N(N(v)) generated by 
conceptually deleting edges from the graph at random until the average degree is below n?/?~*, and then 
letting S = N(N(v)) in this new graph. 

°Their algorithms differ slightly but the bounds are essentially the same. A version of their algorithm is 
described in Appendix A for completeness. 
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3.2 A few additional definitions 


We now present a few additional definitions that will be needed in Chapters 4 and 5. Given 


a graph G = (V, E) on n vertices: 
e For v€ V, let dr(v) = |N(v) NTI. We call dp(v) the degree into T of v. 
e For 5,T CV, let Dr(S) = S> dr(v). We call Dr(S) the degree into T of S. 


ves 


Note that dp(v) = Dy,}(T) and Dr(S) = Ds(T). 


Let 6 = 6(n) = 


Slogn* 


e Let , = {ve V | d(v) € [11+ 6), (14+ 6)t')} for j = 0,1,2,.... That is, we divide 
the set of vertices of degree at least 1 into bins J; so that in each bin, the ratio of 
the degrees of any two vertices is less than (1 +6). The number of bins is at most 


logiysn < (14 0(1))} Inn < flogn. 


e For S CV, let Ni(S) = {v € N(S)| ds(v) € [(1 + 6), (1 + 6)'*")} for ¢ = 0,1,2,.... 
In other words, N,(S) (0 <i < log,,,5 7) is the subset of vertices in N(.5) that are hit 
by at least (1+ 6)’ and less that (1+ 6)'*! edges from S. 


3.3 Useful definitions of progress 


In order to more easily describe and analyze the coloring algorithms presented, it will be 
useful to have several formal notions of “making progress” towards an f(n)-coloring of an 
n-vertex graph. These notions simplify the analysis by allowing us to aim for intermediate 
goals. While we will only need to consider f(n) a function of the form O(n log’ n), the 
notions of progress in fact hold for a more general class of “nearly-polynomial” functions, 


as defined below. 


Definition 3.1 A function f over Z+ is nearly-polynomial if it is non-decreasing and 


there exist constants c,c' > 1 such that for all sufficiently large N, 


F(2N) 2 cf(N) and f(2N)<c'f(N). 


For example, if f(n) = n‘/?, then we may choose c = ¢’ = 2'/?. If f(n) = n® log’ n for 


a > 0, then we may choose c = 2%(1 — €) and c’ = 2%(1 + €) for any constant € > 0. 


Three important ways of making progress towards an f(n)-coloring of an n-vertex k- 


colorable graph are defined as follows. 
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Progress Type 1: [Large-IS] Find an independent or 2-colorable* set S of size Q(n/f(n)). 


Progress Type 2: [Small-Nbhd] Find an independent or 2-colorable set S such that |N(S)| = 
O(F(n)|5}).- 


Progress Type 3: [Same-Color] Find two vertices that must be the same color under any 


legal k-coloring of the graph. 


Progress Type 1 “makes progress” because we can color the set found with at most two 
colors, throw away the colored vertices, pick two new colors to work with and continue. The 
idea for progress Type 2 is that we can use it to find many different 2-colorable sets, each of 
which is independent of the others because each set has a small neighborhood; combining 
the sets found gives us a large 2-colorable set and thereby progress of Type 1. Progress Type 
3 always helps us towards any approximate coloring. More formally, besides showing that 
each type of progress is useful individually, we would like to say that any combination of the 
three types of progress, in any order, yields an O( f(n))-coloring of an n-vertex k-colorable 


graph. 


Lemma 3.1 [f there exists a polynomial-time algorithm A that is guaranteed given any k- 
colorable graph of m vertices, to make progress of either Type 1, 2 or 3 towards an O( f(m))- 
coloring (where f is nearly-polynomial), then there exists a polynomial-time algorithm B that 


colors any n-verter k-colorable graph G with O( f(n)) colors. 


Progress Type 1 and a weaker variant of Type 2 were used by Wigderson [43]. In fact, 
if we do not care about constants, we can state Wigderson’s algorithm for coloring n-vertex 
3-colorable graphs with O(n'/?) colors as follows. If a vertex v has a neighborhood of 
size 2(n'/) then we make progress of Type 1 using its neighborhood; otherwise, |N(v)| = 
O(1-n'/?) so we make progress Type 2. 

We can also state simply the algorithm of Berger and Rompel [6] to color any 3-colorable 
graph with O(./n/J/logn) colors using these types of progress (here, f(n) = ,/n/Jlogn). 


Select a subset S of 3logn vertices in graph G arbitrarily and examine every independent 


3logn 
logn 


this can be done in polynomial time. For each subset 5, test to see if its neighborhood is 


subset § of 9 of size (logn). Note that there are at most ( ) < n® such subsets, so 


2-colorable; this test will succeed for some S$ since at least one such subset must consist of 
vertices all the same color in some legal 3-coloring of G. Now, if {N(5)| > /nJlogn, we 
have made progress of Type 1. If |N(S)| < /n,/logn, then we have made progress of Type 
2. 


‘Technically, an independent set is 2-colorable. We list both here to emphasize there is no need for the 
set S to require 2 colors. Also, we label this type of progress by “LARGE-IS” since given a 2-chromatic set, 
one can easily find an independent subset of only a factor of 2 smaller. 
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We now prove Lemma 3.1, showing that these types of progress really do “make progress”. 


Proof of Lemma 3.1: First, if algorithm A ever makes progress of Type 3 [Same-Color] 
on a subgraph of G, then since the two vertices u and v found must be the same color under 
any k-coloring of the subgraph, they also must be the same color under any k-coloring of G. 
So, we can just merge the vertices u and v into a new vertex with neighborhood N(u)UN(v) 
and start again from the beginning: in doing so, we remove one vertex from G and use no 
colors. Thus, we may assume from now on that A only makes progress of Types 1 or 2 
when applied to any subgraph of G. 

Claim: If for some constant € > 0 we can always find a 2-colorable set of size em/f(m) 
in a k-colorable graph of m vertices, then we can achieve an O( f(7))-coloring of G as follows. 
We find such a set in G, color it with two colors, remove those vertices from the graph, and 
repeat. 

Proof of Claim: The proof is just a straightforward calculation given below. The 
number C(m) of colors used satisfies C(m) < 2+ C(m-—em/f(m)). Since f is a nearly- 


polynomial function, for each m’ in the range [m/2, mJ], we have: 


C(m') < 24+C(m' — em'/f(m’)) 
< 24+C(m' — e(m/2)/f(m)). (because f is non-decreasing) 


Applying this last inequality f(m)/e times, we get C(m) < 2f(m)/e + C(m/2), which 


implies 
C(m) < 2[f(m) + f(m/2)+...+ fQ)] 
< im) eta te PbO) 
(since f(n) > cf(n/2) for n large enough) 
2c 
< [ge +000] 0m) 
= O(f(m)). O (End proof of claim.) 


Thus, to prove the lemma, we just need some algorithm B’ that on any k-colorable graph 


of m vertices finds a 2-colorable set of size Q(m/f(m)). Algorithm B’ works as follows. 


On input (V, £), where m = |V|, 
1. Initialize set U to the empty set and initialize V’ to V. 
2. While |V’| > m/2 do: 


(a) Let (V’, E’) be the subgraph induced by the vertices in V’. Run algorithm A on 
(V', E’). 
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(b) If A returns with progress of Type 1 {Large-IS], then since |V’| > m/2, we have 
a 2-colorable set of size 27245) = Q(m/f(m)) (since f is nearly-polynomial), 
so halt and output that set. 

(c) If A returns with progress of Type 2 [Small-Nbhd], let S denote the set returned 


by A. Now, update: 


U — UUS 
Vv’ & V'-(SUN(S)). 


Notice that in this step, each time we add vertices to U, we remove all their 


neighbors from V’. So, we maintain the invariant that U has no neighbors in V’. 
3. Halt and output U. 


If we reach step 3 in the above algorithm, it must be that at that point, |V’| < m/2. 
Set U is a 2-colorable set since each set S added to U in step 2(c) is 2-colorable and by 
the invariant mentioned in 2(c), the sets S are all independent of each other (thus, we may 
use the same 2 colors on each set S). Set U is also large because for each set S of size 
r found in 2(c), we add r vertices to U and remove at most r + trf(m) vertices from V’ 
for some constant t by the definition of progress Type 2 [Small-Nbhd].® Thus, |V — V"| is 
at least m/2 and |V — V’| is at most |U| + t|U|f(m). Combining the two inequalities, we 
find |U| + t{U|f(m) > m/2, which implies |U| = Q(m/f(m)). This large 2-colorable set is 


exactly what we needed from algorithm B’. m= 


By Lemma 3.1, we now may just aim for progress of one of the three types in our coloring 
algorithms. This fact will simplify the statements and correctness proofs of algorithms 
presented in Chapters 4, 5, and 6. 

Also, as a simple application of these types of progress, note that progress Type 2 
[Small-Nbhd] can be used to guarantee that for each vertex v, the set N(N(v)) has size 
Q(f(n)?): we make progress if |N(v)| < f(n) since {v} is an independent set and make 
progress if |N(N(v))| < f(n)|N(v)| since N(v) is 2-colorable. Thus, we get the following 


corollary. (We assume here that f is nearly-polynomial.) 


Corollary 3.2 If G is an n-vertex 3-colorable graph such that |N(N(v))| = O(f(n)*) for 


some verter v, then we can make progress towards an O( f(n))-coloring of G. 


> Here we use the fact that f is non-decreasing. 


Chapter 4 


Worst-case bounds for 3-colorable graphs: first 
algorithm 


In this chapter, we describe an algorithm to color any n-vertex 3-colorable graph with 
O(n°*) colors. As mentioned in the last chapter, the algorithm consists of two major parts. 
First, we force the graph without loss of generality to have a useful expansion property. 
Second, we find and take advantage of a form of good distribution of edges that we show 
must exist in any 3-colorable graph. Some of the theorems we prove, in particular those in 
Section 4.3 concerning the distribution property, hold more generally for graphs constrained 
only to have large independent sets. This fact will be useful for us later in Chapter 6 for 


extending these techniques to graphs of higher chromatic number. 


4.1 Forcing expansion 


In this section, we show that if our goal is to color a 3-colorable graph G with O(f(n)) 
colors, where f is a nearly-polynomial function as in Definition 3.1, then we may assume 
without loss of generality that no two vertices share more than n/[f(n)]? neighbors. So, 
for example, if we wish to color with O(n%) colors, we may assume for all u,v € V, that 
|N(u) A N(v)| < n!-?* (for a = 0.4, the shared neighborhood may have size at most n°’). 
This is our first method for forcing expansion in the graph. 

Bounding the number of neighbors that may be shared by two vertices forces expansion 
in the following way. Suppose we wish to color with n® colors. If we look at the neighborhood 
of some vertex v and consider an arbitrary subset of m + d(v) edges leaving N(v), then 
we may assume those edges enter into at least m/n!—?* other vertices. The reason is that 
otherwise, some vertex w # v must have more than n‘~** neighbors in N(v). This fact will 
be useful when we show in Section 4.3 how to find such a set of m edges whose endpoints 
contain an easy-to-find independent set. 

Given the three methods for making progress defined in the last chapter, this method 
for forcing expansion falls out easily. Throughout this section, we assume f is a nearly- 


polynomial function. 
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Theorem 4.1 If G is an n-vertex 3-colorable graph containing vertices u and v such that 


|N(u) A N(v)| = 2 (n/[f(n)]’) , 
then we can make progress of Type 1, 2, or 3 towards an O( f(n))-coloring of G. 


Proof: Suppose u and v are two vertices that share a neighborhood S = N(u)N N(v) 
of size O(n/{[f(n)]?). Clearly, S is 2-colorable since it is a subset of the neighborhood of u. 
So, if |N(S)| < n/f(n), then we have made progress Type 2 [Small-Nbhd]. On the other 
hand, if |N(S)| > n/f(n) and N(S) is 2-colorable, then we have made progress of Type 1 
[Large-IS]. The last possibility is that N(S) is not 2-colorable (and that it is large, but we 
will not need this fact). But, this last case means that wu and v must be the same color 
under any legal 3-coloring of G. The reason is that if w and v could possibly be different 
colors under some legal 3-coloring (say blue and green) then S would be monochromatic 
(red), so N(S') would be 2-colorable (blue and green). So, if our attempt to 2-color N(S) 


fails, then we make progress of Type 3 [Same-Color]. = 


We can use the same argument as above to guarantee without loss of generality that 
a selected set S' of size Q(n/f(n)?) in G is not monochromatic under any legal 3-coloring 
of G. In particular, suppose S were monochromatic, so N(S) is 2-colorable. Then, if 
|N(S)| > n/f(n) we make progress Type 1 [Large-IS], and if |N(S)| < n/f(n) we make 
progress Type 2 [Small-Nbhd]. So, we get the following corollary. 


Corollary 4.2 Given an independent set S of size Q(n/f(n)?) in an n-verter 3-colorable 
graph G, we can either make progress towards an O( f(n)) coloring of G or else guarantee 


that the vertices of S are not all the same color under any legal 3-coloring of G. 


While this corollary is not be immediately useful for us here, an improved, more com- 
plicated method for forcing expansion (described in Chapter 5) consists in part of an im- 


provement to this corollary, and leads to better coloring guarantees. 


4.2 The algorithm 


We now describe the algorithm for coloring n-vertex 3-colorable graphs with O(n?/5 log®/® n) 
colors. As mentioned in the last chapter, the algorithm uses a vertex cover approximation 
algorithm of Bar- Yehuda and Even [4] and (independently) Monien and Speckenmeyer [28] 
that finds a vertex cover of size at most (2 — aoe) times the size of the minimum vertex 
cover in a graph. We will call their algorithm the BE/MS algorithm. A simpler version of 
their procedure for the special case in which it is used in this thesis is given as Algorithm 


Approx-IS in Appendix A. 
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Algorithm First-Approx: 
Given: G =(V,E), a 3-colorable graph on n vertices. Let f(n) = n?/*(log n)*/*. 


Output: Progress of Type 1, 2, or 3 towards an O(n?/*(log n)®/*)-coloring of G. 


1. [Min degree] For each vertex v, ifd(v) < f(n), make progress Type 2 [Small-Nbhd]. 


2. [Expansion] For each pair of vertices u,v, if |N(u)N N(v)| > n/[f(n)]?, then 


make progress using Theorem 4.1. 


3. [Dist-2 Neighbors] Otherwise, for each vertex v, for each i,j € {0,1,...,5log” n}: 
Let ae = N,(N(v) NM I;). 


(Recall the definitions of Section 3.2.) 


4. [VC approx] Run the BE/MS Vertex-Cover approximation algorithm on each 
T,i;- If we find an independent set of size Q(n3/*/(logn)®/5), we have made 


progress Type 1 [Large-IS]. 


The next two sections are devoted to proving the following theorem. 


Theorem 4.3 (Main Theorem) Algorithm First-Approx makes progress of Types 1, 2, or 


3 towards an O(n?/*(log n)*/*)-coloring of any n-vertex 3-colorable graph. 
Using Lemma 3.1 (the usefulness of making progress), we get the following corollary. 


Corollary 4.4 There exists a polynomial-time algorithm that will color any 3-colorable n- 


vertex graph with O(n/*(log n)®/*) colors. 


Let us calculate the running time of the coloring algorithm. The BE/MS algorithm runs 
in time O(N M) on any N-vertex graph with M edges. We may assume for simplicity that 
the graph in Step 4 of algorithm First-Approx has size at most n°/> else we just remove excess 
vertices at random. So, the running time of algorithm First-Approx, which is dominated by 


Steps 3 and 4, is at most: 


[(n vertices) - (log* n j’s) - (log? n i’s) in Step 3] x [n3/5(n3/5)? for vertex cover in Step 4] 
= Gta, 


which is polynomial in n. Note that this is the time needed to give one color to 92(n3/5) 
vertices. One may have to run the algorithm O(n?/>) times in order to color the entire 


graph. 


24 Chapter 4. Worst-case bounds for 3-colorable graphs: first algorithm 


4.3 Forcing good distribution 


From the last sections, we know that if we wish to color an n vertex graph with O(f(n)) 
colors, then we may assume that the graph has minimum degree f(n) (or else we make 
progress Type 2 [Small-Nbhd]) and no two vertices share more than n/[f(n)]? neighbors (or 
else we make progress with Theorem 4.1). 

The goal of this section is to show how, given such a graph G, to find a small number of 
subgraphs such that at least one must be both nearly half red under some legal 3-coloring 


of G (at least (1 — <4.) of its vertices red), and large (size Q(f(n)4/n) = O(n?!) for 


logn 


f(n) = Q(n?/>)). In particular, we will show this holds true for one of a small number of 
subsets of the neighbors of the neighbors of v for some vertex v in the graph. 

We will assume without loss of generality that red is the color in G such that D(red) = 
max (D(red), D(blue), D(green)). That is, of the three colors, red is the color with the most 
edges incident to vertices of that color. The assumption on red implies that D(red) > 
+(D(blue) + D(green)), so 


Dyed(blue U green) > 5 P(blue U green). (4.1) 


Note also that if d is the average degree of the vertices in G, then D(red) > dred]. 


4.3.1 The basic approach, and a counterexample to the naive strategy 


In order to find a large subgraph that is nearly half red, the first step will be to find a large 
subset S € blue U green such that nearly half of the edges leaving S enter into red vertices. 
We know that if we look at the entire set blue U green, at least half of the edges leaving 
that set enter into red vertices (equation (4.1)). The problem is: we do not know how to 
find blue U green. We can, however, look at subsets of blue U green by considering vertex 
neighborhoods, many of which (for red starting vertices) will be blue and green. 

Given the property of blue U green described in equation (4.1), one might expect that 
this property would hold for the neighborhood of some vertex as well: that is, that for some 
v € red, we would have Drea(N(v)) > $D(N(v)). Unfortunately, this may not necessarily 
be the case, and what follows is a counterexample to this seemingly innocent claim. 

Consider a graph with m red vertices ro,...,7m—1, ™+ 1 green vertices go,.--,Gm, and 
m-+1 blue vertices bo,...,b,. Vertices g,, and 6,, are two distinguished vertices with large 
degree and twice as many edges into blue or green vertices than into red vertices. The rest 
of the vertices have low degree, but together there are enough edges with red endpoints so 


that D(red) is greater than D(blue) or D(green). More specifically, the edges in the graph 
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Figure 4.1: A counterexample to the naive strategy. For clarity, only edges incident 
to the distinguished vertices g,, and b,, and incident to a typical red vertex are given. 
The four edges between the red vertex and the non-distinguished blue and green vertices 
are shown as dashed lines. 


are: (see Figure 4.1) 


{(Grns 70) (Gms T1)> ++ +> (Gms Pm/2—1)} 

U {(Gmsbo)s (Gms 1), ++ +1 (Gm Om—1)} 

U {(Omns m/z)» (Bm Pm/2ti)s +++ (bm, Tm—1)} 

U { (Bis Go)» (Bmr G1) +++ (Gms Om—1)} 

U {( gis Ti)s (Gis T(i41)modm)} for each O <i<m-1 

U {(b;, 7:), (6, T41)moam) for each O< i< m—1. 
That is, vertices g,, and 6,, are each connected to a different half of of the red vertices 
and each are connected to all the vertices of index less than m of the remaining color. In 
addition, each r; is connected to two green and two blue vertices of index less than m. 

So, D(red) = 5m, D(green) = (4+ 3)m, and D(blue) = (4+ 5)m. But, for each 
red vertex v, we have Dreg( N(v)) = 8+ m/2 and Dy_rea(N(v)) = 4 +m, which implies 
D(N(v)) = 12+ 3m/2. So, Drea(N(v)) is approximately one third of D(N(v)) rather 
than one half. One can also construct variations of this counterexample in which the ratio 
between Drea(N(v)) and D(N(v)) is even worse. 

The problem here is that the vertices have wildly varying degrees. While one can also 


find variations on this counterexample that hold even when all vertices have degrees in the 
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range [n°~‘,n°t*] for any € > 0, if we restrict the vertex degrees extremely tightly then the 
desired property does hold. That is, if the degrees are nearly identical, then there exists 
v € V such that N(v) has nearly half the edges leaving it entering into red vertices. This is 
the purpose of the bins J; and is the intuition for Theorem 4.5 below. 

Once we have a set S C N(v) with nearly half the edges leaving it entering into red 
vertices, we again use a similar idea to find a large set inside N(5') which is nearly half red. 
The trick again is to separate vertices according to degree, which is the purpose of the sets 
N,(S). This step is handled by Theorem 4.6. 


4.3.2 Theorems and proofs 


We now describe the theorems that allow the above basic idea and the algorithm First-Approx 
to succeed. These theorems are stated in terms of not-necessarily 3-colorable graphs con- 
taining a large independent set R. (The symbol “R” is used to be suggestive of the set 
red.) 


Theorem 4.5 Given an n-verter graph G = (V,E) with average verter degree d, and an 
independent set R such that (1) Dr(V — R) > AD(V — R) for some 0 <  < 1 and (2) 
D(R) > d|R|, then for some v € R and some bin I;: 


1 |N(v)NG| > 6d/log,,5n, 
2. Dpr(N(v)NJ;) > A(1 — 36)D(N(v) Nn Jj). 


In other words, for some v € R, the set N(v) NJ; is a reasonably large fraction of N(v) 
and has almost a fraction A of the edges incident to it going into R. We now look at the 
neighbors of N(v) MJ; and show that for some i, the set N;(N(v)/M J;) has the properties 


we need. 


Theorem 4.6 Given an n-verter graph G = (V,E), a set RCV, and \ € [0,1]: 
For any set S such that Dr(S) > A’D(S), there must exist some i < log,,, such that: 


1. Duysyn($) > 6Dp(S)/ (log, 4s), 
2. NAS) RI/IN(S)| > (1 26)2. 


Assuming for now the correctness of Theorems 4.5 and 4.6, we can prove a corollary 
showing why at least one of the sets created in Step 3 of Algorithm First-Approx will both 
be large and contain an independent set of nearly half its vertices (and so be of the right 


form for the vertex-cover algorithm used in Step 4). 
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Corollary 4.7 Given an n-verter 3-colorable graph G = (V, FE) such that (1) no two ver- 
tices share more than s neighbors and (2) G has minimum degree dmin > max{s(1 + 


5), (3log,,5)/5}, then for some v € V and some i,j € [0,5 log’ n], the set 
T = Ni(N(v) 055) 


has at least 2 ((dmnin)?/(s log" n)) vertices of which at least a fraction $(1— ee) are colored 


red under some legal 3-coloring of G. 


Proof of Corollary 4.7: By definition of set red in G, the conditions of Theorem 4.5 
are satisfied for R = red and A = 1/2 (see equation (4.1)). Let vertex v and bin J; be such 
that claims (1) and (2) of Theorem 4.5 are satisfied for § = N(v)NJ;. By claim (2) of 
Theorem 4.5, set S satisfies the conditions of Theorem 4.6 with 4’ = $(1— 36). Let 7 be the 
index such that claims (1) and (2) of Theorem 4.6 are satisfied and let T = N,(S). Then: 


Dror(S) > 6Dr(S)/(logi45 2) (Theorem 4.6, claim 1) 
> 4[A(1- 38)D(S)]/(logi45) (Theorem 4.5, claim 2) 
> 6A(1— 36) |dmin|S|]/(logiys”) (for all v, d(v) > din) (4.2) 
> 6°A(1 — 36)d?,,,,/(log, 45 2)? (Theorem 4.5, claim 1) 
= 2(8d?,,,/(log’ n)) (using log,,, = O(4 logn)) 
= (dnin/(log’n)) . (6 = ga) 


Since no two vertices share more than s neighbors and S$ C N(v), we know no vertex w # v 
has more than s neighbors in S. Since we have also assumed that dmin > s(1+6), we know 
that the set N;(S) containing v contains no other vertices besides v by definition of Nj. 
Also, since dmin > (3log;,;n)/6, by equation (4.2) we have Drnr(S) > |S| so we know 
T # {v} and thus v ¢ T. So, set T consists only of vertices with at most s neighbors in S 


and we have: 


IT} 


IV 


Drar(S)/s 
Q (@in/(s log’ n)) F 


Also, the fraction of red vertices in T is large: 


IV 


ITN RI/(T| A(1 — 26)(1 — 36) (Theorems 4.5 claim 2, and 4.6 claim 2) 


(1 — 56) (by definition of red, we have \ > 1/2) 


IV 


IV 
Nie 
om 
| 

| 
wel 
3 
ae 


Thus, set T' satisfies both claims of the corollary. = 


Before proving Theorems 4.5 and 4.6, we state a simple combinatorial lemma: 
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Lemma 4.8 Given b balls of which r are red, all placed in k boxes, then for any e (Q<€< 
1), there is some box with at least er/k red balls such that the ratio of the number red balls 
to the total number of balls inside that bor is more than (1 — €)r/b. 


Proof: Throw out all boxes with fewer than er/k red balls. The minimum possible 
ratio of red balls to total balls left is: (r — er)/(b — er) since at worst we throw out k boxes 
containing only red balls. This ratio is strictly greater than (1 —«)r/b. So, by pigeonholing, 
there must exist at least one box left with a ratio of red balls to total balls at least this 


large. om 


Proof of Theorem 4.5: For convenience, we call vertices in the independent set R “red”. 
First, we show there exists a good bin. We are given that Dra(V — R) > AD(V — R). 
We apply Lemma 4.8 where there is one “box” for each of the log,,;n bins J;. For each 
véeV—-R,ifv € Jj, we place d(v) “balls” of which da(v) are red into box 7. So, the number 
of balls in box j equals DU; N(V — R)) out of which Dpa(I; N(V — R#)) are red, and the 
number of balls total is D(V — R) of which Dp(V — R) are red. Lemma 4.8 tells us, taking 
€ = 6, that for some jo, if we let J = J;, 1(V — R), then: 


D,(T) 
Dp(1) 


IV 


6Dr(V — R)/(log,,;n) and (4.3) 
A(1 — 6)D(L). (4.4) 


IV 


Informally, the set J of non-red vertices has the property that many edges have endpoints 
in I (since Dra(I) = 2(D(V — R)) by equation (4.3)), that almost a \ fraction of the edges 
leaving J enter red nodes (equation (4.4)), and that all nodes in J have similar degrees (since 
I CJ;,). We do not know how to distinguish between edges with endpoints in R and other 
sorts of edges, so we do not know which J; contains J, only that such an J; must exist. 

We now show that for some v € R, the set N(v)NI satisfies claims (1) and (2) of Theorem 
4.5. Note that this completes the proof because N(v)N[J;, N(V — R)] = N(v)NJ;, since 
v € Rand R is an independent set. 


Define: 
e R={veR: |N(v)N1|> 6d/log,,,n}. 


R' is the set of red vertices such that N(v) J satisfies claim (1) of Theorem 4.5. We first 
show that nearly of the edges from the set J enter into R’ and then use this to show that 
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for some v € R’, claim (2) of Theorem 4.5 holds. So, from the definition of #’, we have: 


De(I) > Dall) —|RI8*d/log,,5n 
> Dar(I)— DalV — R)6?/log, 457 (since Da(V — R) = D(R) > d|RI) 
> Dpr(I) — (Da(I)(log.4s7)/8) (87/ logs 5 n) — (by equation (4.3) 
> Da(I)(1- 6). 


Finally, applying equation (4.4) we have: 
Dr(I) > A= 26)D(J). (4.5) 


We now claim that for some v € R’, the set N(v) J satisfies claim (2) of Theorem 
4.5. Essentially, the reason for this is that all vertices in J have similar degrees. The actual 


proof is by contradiction, using a counting argument. 


Suppose for contradiction that: ! 
For allvé€ RB’, Dr(N(v) NI) < AM -36)D(N(v) NJ). (contr 4.6) 
If this is the case, then it must also be true that: 
¥ Dri(N(w) nl) < A(1- 36) > D(N(v) NT). (contr 4.7) 
veh’ veR! 
Now, instead of writing each quantity as a sum over v € R’, we would like to write each as 
a sum over w € J. We can do this as follows. 

We may write the sum [)(,<2, D(N(v) ND] as Dyer [Swexeaz d(w)| by the defini- 
tion of D. Now, each vertex w € J is counted in the inside sum dg(w) times since w 
is in the neighborhood of dg(w) different vertices of R’. Thus, ),¢p,D(N(v) AI) = 
ver dr'(w)d(w). Similarly, >,¢”2 Da (N(v) ND) = Dyer da(w). 

Applying the inequality (contr 4.7) we have assumed for contradiction, we get: 


Yo dr(w)? < A(1— 36) So da(w)d(w) 


wel wel 
< A(1— 36) So dp(w)(1+ 6)t* (since d(w) < (1+ 6)/*" for all w € J) 

wel 
= d(1—36)(1+ 6)*t'Dp (I). (by definition of Dp) (4.8) 


For any collection of values, the average of the squares is at least the square of the 


average. Thus: 


DprA? 
qin) 2 Sanco) =e 


wel wel 


Tt is always dangerous to display false equations, so we are labeling these inequalities with the symbol 
“contr” to emphasize that they are just being assumed for contradiction. 
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So, Dr(1)?/|L| < Yuer da (w)?. Combining this fact with equation (4.8), we have: 


ely < AM1—36)(1+ 6)" Dp(D). (4.9) 


Multiplying both sides of equation (4.9) by |J|/Dp (I), we get: 


Dr(I) < A(1— 36)(1 + 6)FeF 7 
< A(1—36)(14+ 6)D(Z) (since d(w) > (1+ 6) for all w € 1) 
< A(1— 26)D(J). 


This contradicts equation (4.5) and completes the proof of Theorem 4.5. = 


Proof of Theorem 4.6: We are given a set S such that Dr(S) > A’D(S); that is, 
at least a fraction of A’ of the edges leaving the set S (double-counting edges with both 
endpoints in S$) enter into R. We want to show that at least one of the sets N;(S) both is 
large and has nearly a fraction A’ of its vertices in R. To do so, we apply Lemma 4.8 where 
we have one “box” for each set N,(S). We place a ball in box i for each endpoint in N;(S) 
of an edge from S to N,(5S). A ball is red if the endpoint to which it corresponds is in R. 
The number of balls in box t is Dy,s)(5) of which Dy,syar($) are red, and the number 
of balls total in the log,,,; boxes is D(.S) of which Dr(S) are red. By Lemma 4.8, taking 


€ = 6, for some % (0 < ip < log,,, 7), 
1. Dn,.¢synr(S) > 6Dr(S)/(logi,,n) and (4.10) 
2. Dy,,(s)nr(5)/Dw,,(s)(S) > (1-6). (4.11) 


By definition of N,,(5), each vertex in N;,(S) is incident to at least (1 + 6)*° and less 
than (1 + 6)'e+! edges from S. Thus, 


Dy,,(s)nr(5) < IN;,(5) a RQ + get 


and 
Dy,,(s)(5) 2 IN;:,(S)|A + 6) 


which implies that: 


INi(S)O RI/INi(S)| > [Dw.gcsyrn(S)/Dw.cs(5)] /( +6) 
> (1-8 /(14 8) 
> (1 =26)N. (4.12) 


Equations (4.10) and (4.12) show that the index 7p satisfies both claims of the theorem. m 
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4.4 Applying the vertex-cover approximation 


Given a graph H on N vertices, M edges, and with a minimum vertex cover of size Nyc, 
the BE/MS vertex-cover algorithm [4][28] discussed earlier (and also presented as algorithm 
Approx-IS in Appendix A) finds a vertex cover of size at most (2 — seen | Nve in time 
O(NM). 

If H has an independent set with at least 4(1 — cen NV vertices, it must have a vertex 
cover of at most $(1+ ae vertices. So, the algorithm will find a vertex cover W Cc V(H) 


of size at most: 


1 log log N = log log N 1 log log N 
2 (1 a ty) (2 ~ “Dog N )N io [1 ~ “4logN i log N talon | N 
< [1 - XG ) N. 
Since W is a vertex cover, V(H)— W is an independent set of size at least Gey): So, 


we have the following lemma. 


Lemma 4.9 Given a graph H on N vertices with an independent set of size at least 3(1 - 
mens the BE/MS algorithm can be used to find in polynomial time an independent set of 
size 2(N/log N). 


We now prove the Main Theorem (4.3). 


Proof of Theorem 4.3: Step 1 of algorithm First-Approx ensures that no vertex 
has degree less than f(n) for f(n) = n?/®log*/>n. Step 2 ensures that no two vertices 
share more than n/f(n)? neighbors. Applying these values to Corollary 4.7 of the previous 
section yields the result that of the O(nlog*n) subsets generated in Step 3 of Algorithm 


First-Approx, at least one set T = T,,; has Q(f(n)*/(nlog’ n)) vertices of which at least a 
1 


logn 


dea) a= 


logn 


fraction $(1— ) are colored red under some legal 3-coloring of G. By Lemma 4.9, since 


im) Step 4 of algorithm First-Approx will find an independent set in T 
of size 2(f(n)*/(nlog® n)). We can thus make progress of Type 1 [Large-IS] on some Ty i; 


in Step 4 of Algorithm First-Approx so long as: 


f(n)*/(nlog*n) = Q(n/f(n)). 


Equivalently, we make progress towards an O( f(n))-coloring so long as f(n)® = Q(n? log® n), 


or f(n) = Q(n?/* log®/* n). Thus, we have proved the Main Theorem. = 


Chapter 5 


Worst-case bounds for 3-colorable graphs: 
improved algorithm 


In this chapter, we present a procedure that improves on the bounds achieved by Algorithm 
First-Approx given in Chapter 4. The essence of the new algorithm is an improved method 
for forcing expansion (see Section 4.1) and making progress from regions of high density in 
a 3-colorable graph. This improves performance and results in coloring n-vertex 3-colorable 
graphs with only O(n3/8) colors. 

Algorithm First-Approx performs most poorly when the input graph consists of a collec- 
tion of high-density regions or “clumps,” with a lower density of edges between clumps. In 
particular, it performs worst when the set $ = N(v) NJ; has a large fraction of its neighbors 
hit by about n°? edges from vertices in S. Here we present an additional tool for making 


progress from such dense regions and thus improve the coloring bound. 


5.1 A useful lemma 


We now present a strengthening of Corollary 4.2, described in Lemma 5.1 below, that allows 
us to force a 3-colorable graph G to behave in a certain “nice” way. In particular, for any 
vertex v of G, for any subset S we select of N(v) of size at least (n log? n)/f(n)*, the lemma 
allows us without loss of generality to force S to contain 9(|.S|) vertices of each of the two 
available colors (that is, the colors that v does not have), or else make progress towards 
an f(n)-coloring of G. This will be useful for forcing sets to expand “roughly evenly” into 
vertices of the available colors in the graph. As with Corollary 4.2, this lemma requires the 
graph to be 3-colorable. 


Let f(n) be some nearly-polynomial function. 


Lemma 5.1 Given a set S C V(G) of size Q((nlog’ n)/f(n)), we can either make progress 


towards an O(f(n))-coloring of G or else guarantee that under every legal 3-coloring of G, 


1 


set S contains less than (1 — TET 


)|S| vertices of any given color class. 


The idea of the proof is that if S consists of vertices nearly all of one color, say red, then 


its neighborhood should contain mostly blue and green vertices and have few red vertices. If 
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this occurs, then N(S) will have a large independent set of size max{|N(5)Ngreen|, |N(5)N 
blue|}. One can thus make progress on N(5S') using the BE/MS Vertex-Cover algorithm. The 
difficulty with this approach is that the neighborhood N(5S) need not have few red vertices. 
It could be, for example, that the red vertices in S tend to have a smaller degree than the 
others. Or, even if all vertices have the same degree, it could be that edges from the blue 
and green vertices of S all enter into different vertices in N(S), but edges from red vertices 
in S tend to hit many vertices multiple times. To handle these difficulties, we will run a 
procedure separating vertices and neighborhoods into bins depending on degree, in a similar 


manner to that done in the proofs of Theorems 4.5 and 4.6. 


Proof of Lemma 5.1: 

For convenience, let red be the color with the most vertices in S. The first goal is to find 
a large independent set S’ C S. We can do this in a greedy fashion by deleting arbitrary 
edges from S. That is, begin with S’ = S, and while S$’ is not an independent set, pick 
an arbitrary edge (a,b) between two vertices of S’ and delete the endpoints from $” (let 


S' — S' — {a,b}). If we ever have deleted more than wee 
|5| 


4logn 


edges from S$, this means we 


vertices not in red from S (an edge can have at most one 


endpoint in red). So, we can guarantee that no color comprises more than (1 — rr of the 
s 


wee edges from S$’), we will 
end with 5S’ an independent set of size at least (1 — reas cae which is Q((nlog’ n)/ f(n)?). 

Since S’ is independent and has size 2((n log? n)/f(n)*), we can make progress Type 2 
[Small-Nbhd] towards an O(f(n))-coloring of G if |N(S’)| < (nlog? n)/f(n), in which case 


we halt with “progress made”. Otherwise, let T = N(S’), so |T| > (nlog’ n)/f(n). 


must have removed over 


vertices of S and halt. Otherwise (we do not delete more than 


The basic idea of the procedure now is the following. We first “throw out” edges so 
that the vertices in S’ have disjoint neighborhoods in T. If at this point all vertices in S’ 
had the same degree, we would be done: if set S’ consisted almost entirely of red vertices, 
then set 7’ would consist almost entirely of blue and green vertices. Since the vertices of S' 
may have differing degrees, we partition S’ into bins based on degree in a similar fashion 
as done with the sets J; defined in Section 3.2. For each bin, either it contains a good 
fraction of non-red vertices, or else its neighborhood is mostly blue and green. Thus, if a bin 
has many neighbors in 7’, we can either make progress using the BE/MS algorithm on the 


neighborhood or else have a guaranteed number of non-red vertices in S$’ (recall, our final 


1 


goal is to guarantee that S has at least 77>— 
ogn 


|S| non-red vertices.) Formally, we perform 


the following steps. 


1. For each vertex w in T, arbitrarily mark one of the edges from w into S’. Let E’ be 


the set of marked edges. Now, for each v € $’, define its marked neighborhood N'(v) 
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blue and green 


blue and green 


Figure 5.1: Vertices in S’ have disjoint marked neighborhoods. If the vertices had 
nearly identical “marked degree,” then a mostly red set S’ would imply a mostly blue 
and green set 7. 


by: 
N'(v) = {wEeT | (v,w)e€ E'}. 


For any set A C 5S”, define the marked neighborhood of A similarly to be: 


N‘(A) = UN). 
veA 
Note that by definition of £’, if A and B are disjoint subsets of S’, then their marked 
neighborhoods are disjoint as well, because each w € T is in the marked neighborhood 


of only one vertex of 5’. (See Figure 5.1.) 


2. Partition S’ into subsets such that in each subset, if we consider only the edges in 
E’, the minimum degree is at least half of the maximum degree. In particular, we 


partition S’ into sets So,...,5,, for m < logn such that: 
Si = {ve S':|N'(v)| € [2', 2°*7 — 1}. 


We may ignore vertices in S’ with no marked neighbors. 
y 1g g 


Observation: Notice that if more than a fraction (1 ) of the vertices of some 


1 
~ 2logn 
S; are red, then at most el of the vertices in N’(S;) can be red, since the non-red 
vertices in S$; can have at most twice as large a marked neighborhood in T' as the red 
vertices do (and, as noted in Step 1, marked neighborhoods of disjoint subsets of 5S” 


are disjoint). 


5.1. 


3. Now, pick zo such that |N’(5S;,)| is maximized; so |N’(S;,)| > ( 


. If we did not make progress in step 4, we know that at least 
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Ge | since there 
are at most (1 + logn) sets 5; and their neighborhoods are disjoint. Note that 19 is 
not necessarily the largest index, since lower index sets might have enough vertices to 


compensate for having fewer neighbors per vertex. 


. We now apply the BE/MS vertex-cover algorithm (or equivalently, the independent 


set approximation algorithm Approx-IS given in Appendix A) to the set N’(S;,). If 
it finds an independent set of size Q(n/f(n)), then we have made progress Type 1 
[Large-IS] and can halt with “progress made”. 


The reason we apply the BE/MS vertex cover algorithm is that if more than a fraction 


(1- Teh) of the vertices of 5;, are red, then by the observation in Step 2, N’(S;,) has 


at most a 
ce 

larger. Thus, by Lemma 4.9, we find an independent set of size OY ares iy= 

Q(n/f(n)) since we have assumed |T| > (nlog? n)/f(n) and |N’(S;,)| > \T|. 


= ; fraction of its vertices red, so N’(5;,) has an independent set of at least 


) of its vertices, namely either N’(S;,) blue or N’(S;,) green, whichever is 


oe n 


en 


) of 


So, if we do not make progress, we know it is not true that more than (1 — Ae 


the vertices of 5;, are red. 


1 . . 
aber of the vertices in 


S;, are blue or green. Now, let S’ — S’ — S;, and let T = N(S"). 


If S’ has not been reduced to less than 1/3 its original size, then go back to Step 1. 
Notice that in this case, we may still assume that |7'| > (nlog? n)/f(n) since S$” still 
has size 0((n log’ n)/f(n)?). 


If S’ is less than 1/3 its original size, then go on to Step 6. 


. If we reach this step, it means we have reduced S’ to less than a third of its original 


~ fraction of 


_1_yigh “this implies 


size, and have done so by removing from S’ sets containing at least a 5 


blue and green vertices. Since S’ originally had size at least (1 — Seaa 


we must have removed more than: 


2 1 1 1 
= 1 — ——_ > ——|S 
3 2logn ( Sign) sI| ams diese. | 


blue and green vertices from S. So, we may halt with the guarantee asked for in 


the statement of the lemma since set S$ could not possibly have contained more than 


(1- Tien) Ot red vertices. 


36 Chapter 5. Worst-case bounds for 3-colorable graphs: improved algorithm 


5.2 Making progress from dense regions 


We will now use Lemma 5.1 to help take advantage of certain types of dense regions in 
3-colorable graphs. In particular, we consider the case of two sets of vertices S and T where 
S is 2-colored under some legal 3-coloring of G and the number of edges between S and 
T is large compared with the sizes of the two sets. This occurs when S is a subset of the 
neighborhood of a vertex (e.g., a set N(v)MJ;) and T is some set N,(S') for a large 7 (see 
Section 3.2). 


Theorem 5.2 Given sets of vertices S and T in an n-vertez 3-colorable graph G, such that 


1. S is 2-colored under some legal 3-coloring of G, 
2. Dr(S) = 2(|S{(nlog” n)/f(n)?), and 
3. [Dr(S)}° = a([is +maxdr(v)| x [|5||T](nlogn)/f(n)? + ITII8Pn2/4(0)4]), 


then we can make progress towards an O( f(n))-coloring of G. 


Before proving this theorem, let us first make sense of the condition on [D7($)]* by 
considering a few examples. Suppose we wish to color with f(n) = n°/8 colors, the set $ 
has size n°/®, and each vertex v in S has degree n/8 into T. Then, a) = n3/8, which is 
greater than n'/*log”n (condition 2). The main condition (condition 3) reduces to: 


n'8/8 > ¢n3/8 [IT |?n5/* log n + T|ni/8] ; 


Ignoring logarithmic factors, the theorem assures us we make progress if |T| = O(n5/8), This 
is the basic idea for the O(n*/* log®/? n)-coloring algorithm described later. For that appli- 
cation of this theorem, if T has 2(n*/8) vertices, we will be able to find a large independent 


set inside 7’, and thus make progress of Type 1. 


5 0.35 


As another example, if we wished to color with n°* colors, S had size n°*° and each 


0.35 


vertex in S had degree n°** into T, then the main condition reduces to 


net > en? [|T/?n°* log n + |T|n**} . 


In this case, we only make progress if |T| = O(n°**) (here the |T'|n'3 term is dominant). 


However, we do not know how to make use of forcing |T| = Q(n°**). 


Proof of Theorem 5.2: For convenience, let blue and green be the two colors that 


appear in S, and let us define the following notation. 
e Let Dreotai = Dr(S). 


@ Let dave = Dtotai/|S| be the average degree into T of vertices in S. 
g g g 
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We want to keep track of those vertices of T that have a reasonably large degree into 5S’, so 
we define a subset 7” of T by: 


oT’ = {weT | ds(w) > 374}. 
Since D;(T — T’) < |T| [3 2a if F we have Ds(T") > $ Drotats or equivalently, 
Dry:(S) > Dyora/2- (5.1) 


We also want to look at those vertices in S that have reasonably large degree into T’, so 
define: 


= {v E S | dyi(v) > ace } 
Since Dr. (S — S’) < |S| [3 Pat8)), we have: Dr.(S') > $Dr(S), which by equation 5.1 
implies: 
Dr (S") 2 Dyotai/4- (5.2) 
Also, by definition of S’ and equation (5.1), if v € S’ then dp(v) > } aie or equivalently, 
dri(v) > i davg for all v € S". (5.3) 


Since we are given (condition 2) that day, = 2((n log? n)/f(n)?), this implies that all v € S’ 
have dp(v) > dr(v) = Q((nlog’ n)/f(n)*). Thus, by Lemma 5.1 (applied to the sets 
N(v) NT), we can guarantee that each vertex v € S’ has at least a fraction mien of its 
edges into T entering into non-red vertices. 

So, for some non-red color, say green without loss of generality, at least Dr($’)/(8 log n) 
edges from 5S’ enter into green vertices of 7. This implies that some green vertex g € T has 


degree at least D7(5S’)/(8|T|logn) into S’. Now, define (see Figure 5.2): 
e X = N(g)NS". 
eY=N(X)NT". 

So, we have: 


|X| 


IV 


gDr(S")/(|T| log n) 
33 Protat/(|T| log n) 
2 (tH) (38). (5.4) 


Note that set X consists entirely of blue vertices, and since Y is in the neighborhood of a 


IV 


blue set, Y contains only red and green vertices. We want to show that Y is large, because 
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Figure 5.2: Vertex g and the sets X and Y. Also, green vertex g’ € S (defined later) and 
the intersecting neighborhoods. 


we will later intersect Y with a red and blue set to get a large monochromatic (red) set, 
which will allow us to make progress. We show that Y must be large as follows. 

By Theorem 4.1 we may assume that no two vertices of X share more than n/f(n)? 
neighbors in T’. Now suppose that |X| < LOY" (1 dave): In this case, each vertex v € X 
can share at most |X|(n/f(n)”) < ddavg neighbors with all of the other vertices in X. This 
implies, by equation (5.3), that v must have at least 5 Onig neighbors in 7’ not shared with 
any other vertices of X. So, set Y must have size at least 22(|X |davg). 

If |X| > LOY (dag), then if we only consider the first Loy" (1 dave) of the vertices of X, 
we still get that [Y| = (LEX (davg)*). So, whichever case occurs, we have: 


IY] = (min {|X |davgs 22 (dag)? }). (5.5) 


n 


By definition, Y is a subset of TJ’ and vertices of T’ all have a high degree into S. So, we 
can lower bound the degree of Y into S by: 


DY) > (33) IY 


2 {TI 
$ 
= FE devel¥| 
= 9 (min {|X [181 (davg)?s LOY" (davg)* zt ) (by equation 5.5) 


ae (min { [HS] (davg)®/ log n, Lea (d)*{3t} ) . (by equation 5.4) (5.6) 


Now we apply condition 3 in the statement of the theorem. The condition (dividing both 
sides by |S|*) states that (d..,)° = [ISI +maxyes dr(v)| 2 (Fe tp logn + Hel oe) . So, 
this implies both that: 


[IS] (davg)*/ log n = ||S|+maxdp(v)] -2 (723) (5.7) 
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and 
Ho (dave)? [Ht] = [151+ maxdr(v)] 2 (zep)- (5.8) 
Thus, combining both equations (5.7) and (5.8) with equation (5.6), we get: 
Ds(Y¥) = 2 (aa [isi + max dr(v)] (5.9) 


It now must be that one of the following two cases occurs. The first case is that there 
is some green vertex g’ € S in the neighborhood of more than $Ds(Y)/|S| vertices of Y. 
In this case, according to equation (5.9), it must be that Dyyj(Y) = Q(n/f(n)?). So, 
N(g') NY is a set of 2(n/f(n)*) vertices, all of which are red since N(g’) C blue Ured and 
Y C red U green; see Figure 5.2. Thus, we can make progress on this monochromatic set 
using Corollary 4.2. 

The other possibility is that no green vertex in S is in the neighborhood of more than 
$Ds(Y)/|S| vertices of Y. In this case, the set of all vertices in S hit by more than 
3 Ds(Y)/|S| edges from Y is all blue. Define Z to be that set; that is: 


¢ Z={vES | dy(v) > 3Ds(¥)/|5]}. 

Clearly, the number of edges between vertices of Y and vertices in (S — Z) is at most 
|S|($Ds(¥Y)/|S|) = 4Ds(Y). So, Dz(Y) > $Ds(Y). Thus, we can bound the size of Z by: 
|2Z| 2 3Ds(¥)/ max dy(v) 

> 3Ds(¥)/ max dr(v) 
which by equation (5.9) implies: 
[Z| = Q(n/f(n)’). 


Since Z is monochromatic (blue) we can now use Corollary 4.2 to make progress. So, 
whichever of the two cases occurs, we have made progress towards an O(f(7))-coloring. 


The final algorithm for making progress given our sets S and T is as follows: 
Algorithm Dense-Region-Progress: 
Given: Sets S and T satisfying the conditions of Theorem 5.2 in some graph G. 
Output: Progress towards an O(f(n))-coloring of G. 
1. Run the algorithm of Lemma 5.1 on N(v) QT for all v € S. If any runs make 
progress towards an O( f(n))-coloring, then halt. Otherwise, we know there are 


many edges from S into red, blue, and green vertices of T under any legal 3- 


coloring of G. 
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2. If for some pair of vertices u,v € S, we have |N(u)M N(v)| > n/f(n)’, then use 


Theorem 4.1 to make progress. 
3. Otherwise, for each vertex v € T, 
(a) let Y = N(N(v)NS)OT and let Z = {we S: dy(w) > n/f(n)’}. 
(Note that , we do not need to use the sets S’ and T’; they were just conve- 


nient for the analysis.) 
(b) Run the algorithm of Corollary 4.2 on Z. 
(c) For each w € Z, run the algorithm of Corollary 4.2 on Y N N(w). 


The above proof guarantees that this algorithm makes progress. um 


5.3 The coloring algorithm 


We now combine algorithms First-Approx and Dense-Region-Progress to get an improved 


algorithm guaranteed to O(n3/ 8)-color any n-vertex 3-colorable graph. 


Algorithm Improved-Approx: 
Given: G = (V,E), a 3-colorable graph on n vertices. Let f(n) = n°/®(log n)5/?. 


Output: Progress towards an O(f(n))-coloring of G. 


1. For each vertex v, if d(v) < f(n), make progress Type 2 [Small-Nbhd]. 
2. Otherwise, for each vertex v, for each i,j € {0,1,...,5(logn)?}: 

(a) Let S= N(v)NI;. 

(b) Let T = N,(S). 

(c) If |T| > n5/8/(log n)3/?, run the BE/MS Vertex-Cover approximation algo- 
rithm. If we find an independent set of size at least n/f(n), we have made 
progress Type 1 [Large-IS]. 

(d) If S and T satisfy the conditions of Theorem 5.2, then make progress using 
Algorithm Dense-Region-Progress. 


Theorem 5.3 Algorithm |mproved-Approx will make progress towards an O(n°/®(log n)>/?)- 


coloring of any n-verter 3-colorable graph. 


Proof: Assume Algorithm Improved-Approx does not make progress in Step 1. So, we know 
that the minimum degree d > f(n) = n°/®(logn)°/?. As in Chapter 4, let R = red be the 
color class with D(red) = max(D(red), D(blue), D(green)). 
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We now apply some of the facts proven in Section 4.3.2. Theorem 4.5 guarantees us 
that for some vertex v € R and some index j, the set S = N(v) NJ; in Step 2(a) has the 
property that: 


{S| 6 f(n)/log,,5, and (5.10) 
D(S) > $(1-36)D(S), (5.11) 


IV 


where 6 = ore Note that for the given value of f, equation (5.10) and the definition of 6 


imply that: 
[S| = Q(n3/8/(log n)?/*). (5.12) 


Theorem 4.6 (using A’ = 3(1— 36)) shows that for some index i, the set T = N,(S) of step 
2(b) has the property that: 


Drar(S) 
ITA RI/IT| 


IV 


6Dr(S)/log,,;n, and (5.13) 
$(1 — 26)(1 — 36). (5.14) 


IV 


Let us now, for the rest of the proof, fix two such sets 5 and T satisfying equations (5.10) 
through (5.14). We now show that these equations and the definitions of § and T will 
ensure success of the algorithm. 

Suppose first that |T'| > n°/®/(logn)*/?._ By equation 5.14 above, set T contains an 
independent set (TM R) of at least a fraction 3(1 — j55;) of its vertices (using 6 = ¢27,). 
So by Lemma 4.9, the BE/MS vertex-cover algorithm finds an independent set of size 
2 (n®/8 /(log n)*/?) = Q(n/f(n)) so we make progress Type 1 [Large-IS] in Step 2(c). 

On the other hand, if |T| < n*/®/(log n)°/?, then we just need to show that S and T 
satisfy the conditions of Theorem 5.2. Clearly, S is 2-colored under any legal 3-coloring 
of G since S C N(v), so Condition 1 is satisfied. For f(n) = n3/8(log n)*/?, Condition 2 
reduces to D7(S)/|S| = Q (n!/*/(log n)*), which is found to be easily met using equations 
(5.11) and (5.13) as follows. 


Dr(S) > Drna(S) = 2(D(5)/(logn)*) (5.15) 
= Q(d|S|/(logn)*). (5.16) 
So, 
Dr(S)/|S| > 2(n*/8 (log n)}) (5.17) 
= 2(n'/4/(logn)*). (5.18) 


The last task is to show that Condition 3 is satisfied, which for the given value of f, 


reduces to the requirement that 
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3 ge 2 ake 
[Dr(5)] = 0 ( [is + max aco (istiz' ae ai ana a + |T||S| (Io aeave): (5.19) 


To show that this requirement holds, we upper bound the quantities |5|, |T|, and 
MaxXyes dr(v). 


From equation (5.17), we have 
[S| = O((logn)'/*Dp(S)/n/8). (5.20) 
Next, our very condition for this case was that: 
IT] = O(n*/? /(logn)*?). (5.21) 


Finally, since S € J; so all vertices of S have nearly the same degree (though not necessarily 


the same degree into T), we can bound max,yes dr(v) as follows: 


maxdr(v) = O(D(S)/|5]) 
= O(Dr(S)(logn)*/|S]) (using equation 5.15) 
= O(Dr($)(log n)*(log n)?/?/n3/*) (using equation 5.12) 
= 0(Dr(S)(logn)9/?/n**), (5.22) 


The three equations (5.20), (5.21), and (5.22) allow us to reduce requirement (5.19) to the 


condition that: 


3 9 2Dr(S) a 2 nil 
[Dr(S)}" = a ( [oan n3/8 |. [p 7(S) Toe nyt aaa + Dr(5) aearsn 
n3/4 ; 
- a(syt-9 (ay basi) es 


Equivalently, we just have the requirement that Dr(S}) = 2(n*/4/(log n)? + Dr(S)/(logn)®). 
Clearly, Dr(S) = Q(Dr(S)/(logn)®) so we simply need D7p({S) = 2(n*/4/ (log n)?). We 


are now done, because combining equations (5.17) and (5.12) yields: 


Dr(S) 


(|| n°/8/(1og n)*/?) 
0(n9/*/(log n)?). 


Thus, Step 2(d) of Algorithm Improved-Approx makes progress. # 


Chapter 6 


Worst-case bounds for k-colorable graphs 


We now consider two different methods for using the preceding techniques developed for 
3-colorable graphs to improve the bounds for approximately coloring k-colorable graphs for 
fixed k > 3. One method is simply to use the preceding algorithms as an improved base case 
for a recursive strategy used by Wigderson [43]. A second method is to directly extend the 
above algorithms for k > 3. For the latter approach, one needs both an analog of the shared 
neighborhood condition (Theorem 4.1), and a way to cascade together several applications 
of the distance-2 neighbor-taking process (Step 3 of Algorithm First-Approx) so that we can 
“pump up” the relative size of the largest independent set. We will see that the second 
method yields better asymptotic bounds than the first, though with diminishing returns 
as k increases. However, the running time of the second method grows as (n log? n)?#+°() 
while the running time of the first is dominated just by the time taken by the base-case 
algorithm. The two methods can be combined, providing a time/performance tradeoff, by 
choosing some ky and using the second method as a base case for the first method for k > ko. 
This will result in an algorithm with running time O((n log? n)?*et*) for some constant c. 
The results of these approaches are summarized (in “OQ” notation) in Table 6.1. The 
first row shows the bound for using Wigderson’s algorithm with base case at k = 2. The 
second and third rows show how the bounds are improved when we use the new coloring 
method as base cases for k = 3 and k = 4 respectively. The last row shows the best bounds 
we can get using the direct extension. Note: the bounds in the last two rows are with 
high probability over the coin tosses of the algorithm. See Corollaries 6.2 and 6.7 for more 


precise bounds. 


6.1 A simple recursive approach 


A standard method [43][6][22] to approximately color k-colorable graphs is to pick a vertex 
of high degree and recursively try to color its (k — 1)-colorable set of neighbors with as few 
colors as possible. When we get to a 2-colorable set, we can just directly 2-color that set 
in the standard way. For example, Wigderson’s algorithm for coloring k-colorable graphs 


with kn!-!/(-)) colors can be described as follows: 


43 
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5 6 7 general 
Wigderson [43] nil? 92/8 8/4 AB S/6 et 


base: k = 4 


best we have 


Table 6.1: Summary of results in “O” notation for various combinations of algorithms. 
Items “base: k = 3” and “base: k = 4” correspond to using Algorithm Recursive-Color 
with Algorithm Multi-Stage-Color as a base case for k = 3 or 4 respectively. 


Wigderson’s Algorithm for k-colorable graphs: 


Given: A k-colorable graph G on n vertices. 


Output: A coloring with at most kn'~'/(- colors. 


1. If there exists a vertex v with at least n'~‘/(t-) neighbors, then color the 


1 k-2 =3 
neighborhood recursively with (k — 1) (n!~4/@-))'"*? = (k — 1) (ni=t)™ 
(k- 1)n®=* colors. Then remove those nodes from the graph and the colors from 
the palette. 


Note that this step can be executed at most n'/(*-)) times, resulting in a total 


of (k — 1)n*=t+7 = (k — 1)n!~4/@- colors used in this step. 


. Otherwise, greedily color the graph left with n!~‘/-) colors. 


So, the total number of colors used in both steps together is 


kn} /G-0), 


(Note that for the base case of k = 2, we have 2 = 2n}-'/@-1)_) 


The algorithms presented in the previous chapters allow one to stop at k = 3 as a base 


case instead of k = 2 in this type of procedure and thus use fewer colors. More generally, 


we can describe when a bound achieved for coloring graphs of chromatic number kg will 


improve the performance of this kind of recursive procedure for graphs of higher chromatic 


number. In particular, suppose we have an algorithm .A to color any n-vertex ko-colorable 
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graph with O(n%) colors. Then, the important quantity for this approach, which we call 
the recursive performance r(A) of the algorithm, is: 


1 
l-a’ 


r(A) = ko (6.1) 


If an algorithm has a higher value of r, then the bounds achieved by using that as a base 
case for k > ko will be improved. Specifically, the recursive algorithm will color k-colorable 
graphs for k > ko with O (n!~1/(#-r(A))) colors. So, for example, using the fact that we can 
2-color 2-colorable graphs (ky = 2,a = 0), we find r = 1 and the bound is O (n!~!/(F-), 
Using the improved bounds for coloring 3-colorable graphs in chapter 5 (ko = 3,a = 3/8), 


we get r=3—- es = 7/5, so the improved bound for k > 3 is: 


O (ne =) colors. (6.2) 


Later, in Section 6.2, we will see how to color 4-colorable graphs with O(n/*) colors, so we 
getr=4- YL: = 3/2. Thus, for k > 4, we can color with O(n!- 977) colors. 
The following theorem more precisely describes the bounds achieved by the recursive 


approach. 


Theorem 6.1 Given an algorithm A to color any m-verter kg-colorable graph with cm log® m 
colors, then algorithm Recursive-Color(A) below can color any n-vertex k-colorable graph 
(k > ko) with at most: 

Cr(n) = [e+ (k— ko)]n'-/ 8-9 (log n) SLI (6.3) 


colors, where r = r(A) = ko — ch. 


Using Theorem 6.1 and the bounds achieved by algorithm Improved-Approx, (ko = 3,a = 


3/8, 8 = 5/2),we can restate formula (6.2) more precisely in the following corollary. 


Corollary 6.2 Algorithm Recursive-Color(improved-Approx) colors any n-vertex k-colorable 
graph (k > 3) with at most 
O (n'- "7 (log n) "7 ) 


colors. 
The recursive algorithm to achieve these bounds is described below. 


Algorithm Recursive-Color: (Variant on Wigderson’s algorithm) 


Given: An n-vertex k-colorable graph G and an algorithm A to color any m-vertex 


ko-colorable graph with at most C,,(m) = em® log’ m colors (ky < k). 


Output: A C,(n)-coloring of G, for C;,(n) as defined in equation (6.3). 
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1. Letr=kop- =. 

2. Let f(n,k) = nV") (log no 

3. While there exists a vertex with at least f(n,k) neighbors, select f(n,k) of its 
neighbors and color them with C,_i(f(n,k)) colors. Remove those nodes from 
the graph and the colors from the palette. 


Note that we can execute this step at most n/f(n,k) times. 


4, Otherwise, greedily color the graph with f(n,k) colors. 


Proof of Theorem 6.1: Let A be an algorithm that colors any m-vertex ko-colorable 
graph with cm“ log’ m colors and let r = r(A). We will use C,(n) to denote the coloring 
bound achieved on n-vertex k-colorable graphs. First, formula (6.3) in the statement of the 


theorem holds for the base case of k = kg since for k = ko, we have: 
Ci(m) = en! 70=2F(log n)*? 
= cnlog’ n. 
Let c, = c+(k—ko) and let f(n,k) = n-¥* (log n)?¥= as in Algorithm Recursive-Color. 
So, assuming the bounds of Theorem 6.1 inductively for k’ < k, we need to show that 
Cy(n) < a f(n, k). 


Since we can loop in step 3 of Algorithm Recursive-Color at most n/f(n,k) times, this 


results in the recurrence: 


Cx(m) < Ce-1 (f(n, k)) [n/f(n, k)] + f(n, &). 


So, substituting in the bounds of Theorem 6.1 inductively, we have: 


C;,(n) 


IA 


[ee—aLf(n, ky}? Yflog f(n, KPO] [p25] + f(r. k) 
< ee-alf(n, k)P MO Yflog nfo) [2] + f(n, b) 

= ce-n[f(n, b)}- M/F Mflog n=) +. f(r, b) 

= Cy_in (Coa waa ([log npr yr [log n]J@(=7) 4 f(n, k) 
= cy_yn'- FF flog n]o( =a +) 4 f(n, k) 

= c_\n'-F flog nF) + f(n, k) 

= c_if(n,k) + f(n,k) 


= cf (n, k). a 
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6.2 Directly extending the k = 3 algorithm 
6.2.1 Motivation 


In this section, we describe how the methods of Algorithm First-Approx of Chapter 4 can be 
applied directly to graphs of higher chromatic number, yielding improved coloring bounds 
for such graphs. Unfortunately, we do not know a way to extend the approach of Algorithm 
Improved-Approx in a similar way, though it can still provide a useful “base case”. 

The main idea of Algorithm First-Approx was to look at large subsets of the distance- 
2 neighbors of vertices in a 3-colorable graph: in particular, the sets N;(N(v)/M J;) for 
each vertex v and each pair of indices 1,7. The “well-distributed” property proved in 
Theorems 4.5 and 4.6 ensures that one such set will be nearly half red under some legal 
3-coloring of the graph, and the expansion property of Theorem 4.1 ensures the set is large 
as well. 

While the expansion property depended heavily on the graph being 3-colorable, the 
theorems forcing good distribution require only that the given graph have an independent 
set of large total degree (see Section 4.3.2). In particular, they simply require that there 
exist a large independent set R such that Dra(V — R) > AD(V — R) for some constant 
and that the graph have sufficiently large minimum degree. So, we could conceivably make 
progress on graphs of a higher chromatic number than 3 by cascading several applications 
of the distance-2 neighbor-taking stage in the following way. 

Suppose, say, G is a 5-colorable graph and we wish to color G with f(n) colors. Then, 
we know there exists an independent set R such that Da(V — R) > 4 D(V — R) and we can 
establish a minimum degree of f(n). If we could guarantee that no two vertices shared too 
many neighbors, we could look at the sets 7,;,; and be assured that one will be large and 
have an independent set Rk’ = RNT,,;; such that |R’| + 3|7.:,;| using Theorems 4.5 and 
4.6. Let us now focus on the subgraph G’ induced by T,,;, and let V' = T,,;. Suppose 
we could in addition somehow ensure that within G’, the vertices of R’ had about the same 
average degree as the other vertices of V’. Then we would have D(R’) = }D(V‘), which 
would imply that: 


Da(V'—R') = 4D(V'-R), (6.4) 


since Dy: (V' — R') = D(R’) and D(R’) = 4 D(V’) = 3(D(V’ — R’) + D(R’)), where we are 
now counting degrees only within G’. 

Now, if we re-establish a minimum degree without destroying (6.4) above, we could 
then re-apply the distance-2 neighbor-taking process within G’ to get a set V” containing an 
independent set R” such that |R”| ~ 3|V”|. If again we could ensure that D(R”) = $D(V") 
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within the new graph G”, we would get: 
Dri(V" = R”) ~~) 3D(V" “aa R"). 


Thus, one final application of examining the sets T,;; within G” will yield some set on 
which the BE/MS vertex-cover algorithm makes progress. 

So, the two main ingredients needed to make this procedure go through are (1) how to 
ensure that no two vertices share too many neighbors in common, and (2) how to get from 


|R’| = A|V’| to D(R’) = AD(V"). These problems are solved in the following sections. 


6.2.2 The bootstrapping algorithm 


We now describe procedures that allow us to “bootstrap” applications of Algorithm First-Approx 
to graphs of higher chromatic number. The resulting algorithm Multi-Stage-Color will color 


any n-vertex k-colorable graph with: 


© f,(n) = O(n?) log? n) colors, 
where a(k) will be defined inductively in k, and B(k) is a nondecreasing function such 
that @(k) < 5.5. The exponent / of the logarithm in fact approaches 5.5 as k — oo. 
Because a: is the critical value and the log factors are low-order terms, for purposes 


of simpler analysis we will not attempt to get tight bounds and assume is fixed at 
5.5 for all k > 3. 


For base cases, a(2) = 0 and a(3) = 3/8 using algorithm Improved-Approx. The recursive 


formula for a(k) for k > 3 is: 


1 1 1 1 
ira 7 2 tiraery (a): (65) 


We will examine this formula in more detail later, but we just note here that @ is non- 
decreasing in k. 

We need in this section to redefine the value 6 to depend on the chromatic number k of 
the graph G we wish to color. In particular, we shall use: 


1 
e 6=6(k) = Ticen’ 
The sets J; and N,(v) used in Chapter 4 now depend on this new quantity. 

As mentioned previously, the theorems of Section 4.3.2 forcing good distribution do not 
require that the graph be 3-colorable, only that there exist a large independent set R such 
that Dra(V — R) > AD(V — R) for some constant » and that the graph have sufficiently 
large minimum degree. Let us, in fact, repeat Corollary 4.7 here, removing all mention of 
the chromatic number of the graph. (The fact that the graph was 3-colorable was used only 


in showing that A > 1/2.) 
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Corollary 6.3 (Variant of Corollary 4.7) Suppose G = (V,E) is an n-vertex graph 
such that (1) no two vertices share more than s neighbors, (2) G has minimum degree 


dmin > max(s(1 + 6),(3logn)/5?), and (3) G contains an independent set R such that 


Dr(V — R) > AD(V — R) for some constant » € [0,1]. Then, for any 6 = ease for some 
v € V and some i,j € [0,...,log,,,n], the set 


Qi ig = N,(N(v) nN T;) 
has size at least 0 ((dmin)?/(s log” n)) and the property that |T,,;9 R| > A(1 — 56)|T, |. 


We now present a new method to ensure that no two vertices share too many neighbors. 


Theorem 6.4 Given an n-vertex k-colorable graph G containing two vertices that share 
at least n™=30=7) neighbors and an algorithm A to color any m-vertex (k — 2)-colorable 
graph with f,2(m) colors, Algorithm Sharing-Progress below will make progress towards an 


f,(n)-coloring of G. 


Algorithm Sharing- Progress: 


Given: (1) An n-vertex k-colorable graph G containing two vertices that share at least 
ee) neighbors, and (2) an algorithm A to color any m-vertex (k — 2)-colorable 
graph with f,-2(m) colors. 


Output: Progress towards an f,(n) coloring of G. 


1. Let S = N(x) N(y) where x and y share at least ni=ae=3) neighbors, and let 
Gs be the subgraph induced by set S. 


2. Run algorithm A on Gs. Note that if Gs is (k — 2)-colorable, then A will color 


Gs with at most: 


O(| S| (log |5])°F-) 
O(|S|°*-® (log n)®) colors, 


fr-2( 51) 


lA 


(using |S| <n and @ non-decreasing). Thus, Algorithm A will find an indepen- 


dent set of size at least: 


[Sieratee2) 7 ni- ok) : : 
(Togas) = (Gcgnps) (for the given choice of |S|) 
= YUn/f.(n)). 


Thus, if G's is (k —2)-colorable, then we have made progress of Type 1 [Large-IS]. 
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3. If we did not make progress in Step 2, it must be that Gs was not (k — 2)- 


colorable. The only way this could be is ifx and y must be the same color under 
any legal k-coloring of G. So, we can merge vertices z and y and make progress 


of Type 3 [Same-Color]. 


The argument given in Algorithm Sharing-Progress proves Theorem 6.4. 


We now use Algorithm Sharing-Progress in a procedure that allows us to “bootstrap” 


applications of Step 3 of Algorithm First-Approx. 


Algorithm Bootstrap: 


Given: (1) Values a € [0,1], 8 € Z and 6 = 
H (m > 1/6?) of an n-vertex graph G such that H contains an independent set R 
with |R| > A|V(H)| for some constant » > 0. 


eieeae and (2) An m-vertex subgraph 


Output: Either: (1) progress towards an O(n log® n)-coloring of G, or else (2) at 
most m/2 subgraphs Go, Gi,...,Gmy2-1 of H such that with high probability at least 
one G; has both a minimum degree of (6°™)n® log’ n and considering only edges 


within G;, D(RAV(G;)) > (A— 26)D(V(G;)). 


1. Let Go = (Vo, Eo) = H. Inductively create graph G; = (V;, E;) from graph G;_, 


for i = 1,2,...,m/2— 1 by selecting an edge at random in E;_, and deleting 
both endpoints. So, |V;| = |V;-1 — 2|. 


. For each G; with at least 6m vertices, while G; contains a vertex with degree 


less than 6?mn°—! log’ n: delete from G; the vertex of minimum degree and all 
incident edges. 
Suppose we have removed more than 6?m vertices from any G;. Since within 
the set W; of vertices deleted from G;, the degree of each vertex can be at most 
6?mn*—' log” n, we can greedily find an independent set inside W; of size at least: 
2 
ee n/(n® log® n). 


So, we make progress Type 1 [Large-IS] towards an O(n* log” n)-coloring of G. 


. If we did not make progress in Step 2, then output the graphs G; for i = 


0,1,...,m/2—1. 


Theorem 6.5 [Algorithm Bootstrap works as guaranteed] Given an m-verter subgraph 


H (m > 1/6") of an n-verter graph G such that H contains an independent set R with 
|R| > A|V(A)| for some constant X. Then, either (1) Algorithm Bootstrap makes progress 
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towards an O(n“ log’ n)-coloring of G in Step 2, or else (2) with high probability, one of 
the subgraphs G; = (V;,E;) has both a minimum degree of 6?mn°"! log? n and within the 
subgraph, D(RNV;) > (A — 26) D(V;). 


Proof: Let us consider the graphs G; created after Step 1 of Algorithm Bootstrap, but 
before deleting vertices in Step 2. Let R; = V;N R and let N = m(1 — 6)/2; note that set 
Vy contains 6m vertices. We show now that with high probability, for some index 7 < N, 
we have D(R;) > (A — 6)D(V;). The idea of the argument is that since we are removing 
vertices with a probability proportional to their degree, if D(R;) < (A — 6)D(V;) for all 
such i, then we would remove many fewer vertices from R than from V — R. In fact, with 
high probability we would remove so many fewer that once we reach graph Gy, the set Ry 
would be larger than than Vy, a clear contradiction. 

For each 7 < N, let A; be the event that in creating G;,, from G;, we delete an edge 
with an endpoint in R;. Since the number of edges in FE; with an endpoint in R; is exactly 


D(R;) (because R; is an independent set), we have: 


Pr[A;] = D(R;)/|E;| 
2D(R;)/D(Vi). (6.6) 


Suppose for some index 7 < N we have D(R,) < (A-6)D(V;). Then, the probability event 
A; occurs is at most 2(A — 6). 

Let p = 2(A — 6) and assume for contradiction that D(R;) < (A — 6)D(V;) for every 
i < N. So, for each 2 < N, the probability that the ith edge removed from G has an 
endpoint in FR is less than p. Since we remove N edges to create Gy and every time we 
remove an edge the probability it has an endpoint in R is less than p, by Chernoff bounds [2] 
the probability we remove more than pN(1+ 6) vertices from R is at most e~*%%), Since 
pN = Q(m) and we assume m > 1/6? in the statement of the theorem, the probability we 
remove more than pN(1 + 6) vertices from R is o(1). Thus, with high probability: 


|Rn| > Am—pN(1+4+6) 
= Am-— 2A — 6)[m(1 — 6)/2](1 4+ 4) 
= m[A—(A-6)(1- 6’)] 
= 6m+mé(d—6) 
> dm. (since A > 6) 
So, with high probability, |Rv| > |Vyw|, a contradiction. Thus, with high probability our 


assumption that D(R;) < (A — 6)D(V;) for every i < N is incorrect; that is, for some V; of 
size at least 6m, we have D(R;) > (A — 6) D(V;). 
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Now, let 2 be such that |V,;| > 6m and D(R;) > (A—6)D(V;) before Step 2 of Algorithm 
Bootstrap. In Step 2, if at most 6?m vertices are removed, then we remove at most a fraction 
6 of the vertices of V; in order to establish the desired minimum degree. Since we are always 
removing the vertex of least degree, we remove at most 6D(V;) from the total degree sum 
of the subgraph. Even if, at worst, all the vertices removed were from the set R;, we still 


have in the graph remaining that: 
D(R;) 2 (A - 26)D(Vi), 
as claimed. sm 


Given Theorem 6.5, we have an improved approximation algorithm for coloring graphs 
of chromatic number k > 3 as follows. We first apply algorithm Sharing-Progress; we 
then run the distance-2 neighbor-taking stage of Algorithm First-Approx k — 2 times, using 
Algorithm Bootstrap to “clean up” the graph in between applications; and finally, we use 
the BE/MS vertex-cover algorithm. The formal algorithm to color any k-colorable graph 
with O(n? log” () n) colors is given below. For simplicity, we have separated out the 


distance-2-neighbor/bootstrap step into a separate procedure. 


Algorithm Multi-Stage-Color: 
Given: An n-vertex k-colorable graph G. 


Output: Progress towards an O(n% log? n)-coloring of G for a = a(k) as defined by 


the recursion in equation (6.5), and 3 at most 5.5. 
Let f(n) = n® log’ n. 
1. [Base case] If k = 2 then just color G with 2 colors. If k = 3, then run Algorithm 
Improved-Approx on G. 
2. [Minimum degree] For each vertex v, if d(v) < f(n), make progress Type 2. 


3. [Minimum sharing of neighbors] For each pair of vertices u,v, if|N(u)AN(v)| > 
niaaeasy , then make progress using Algorithm Sharing-Progress. Note that Algo- 
rithm Sharing-Progress will use Algorithm Multi-Stage-Color recursively on (k—2)- 


colorable graphs. 


4, [Initial distance-2 neighbors] For each vertex v and each pair i,j € [0,...,log,,5 7], 
let G, i,j be the subgraph induced by the set N;j(N(v) J;). 


5. [Additional neighbor-taking stages] For each graph G,;,;, run Procedure 
Iterate-neighbors below on input (n,k,G,i;,k — 3). 
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If the algorithm makes progress on any of the inputs given, then halt with success. 
Otherwise, let G,,...,G, be all the graphs returned by Iterate-neighbors, for 
q = O([(logi,5n)*~*n7*-*). 

6. [Vertex-Cover approximation] Run the BE/MS vertex-cover algorithm on the 


graphs Gj,...,G,. 


Procedure lterate-neighbors: (n,k, G’, iter) 


Given: Values n and k. An m-vertex subgraph G' of some n-vertex graph G, and 


a number of iterations iter. 


Output: O([m?(log,,;m)*]*°") subgraphs of G’ or else progress towards an 
O(n*) log® n)-coloring of G. 


Pl. If iter = 0, then return G’. 

P2. If iter > 1, then run Algorithm Bootstrap on G’ and values a = a(k),G = B(k), 
and 6 = 6(k). 

P3. If Algorithm Bootstrap returns progress towards an O(n) log®“”) n)-coloring 


of G, then halt with success. Otherwise, let Ho,...,H=_1 be the subgraphs 


returned. 


P4. Now, for each HM), (0 < | < %— 1) for each vertex v in H, and each index 
1,7 € [0,...,log,,, mJ: 
(note: there are at most m?(log,,; m)* different 4-tuples (1, v, i, 7)) 
(a) Let Gi, ;; be the subgraph of H, induced by N,(N(v)N1;), where neighbor- 
hoods are taken within H;. 


(b) Run: Iterate-neighbors(n, k, Givi; iter — 1). 


Theorem 6.6 Algorithm Multi-Stage-Color, given any n-verter k-colorable graph, makes 
progress towards a coloring with O(n**)(logn)®>) colors, for a(k) as defined in equa- 


tion (6.5). 


Before proving Theorem 6.6, let us examine the claimed performance more closely. Let 


7(k) = rerree So, equation (6.5) can be written as: 


1 1 
One can see from this equation immediately that y(k) < 2+ y(k — 2); that is, if we 
increase k by 2, then y increases by less than 2. We can compare this with the sim- 


pler approach from Section 6.1. Algorithm Recursive-Color given there colors k-colorable 
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graphs with O(n*)) colors for a’(k) = 1— ;: for some constant 7. Thus, the quantity 


7'(k) = yraqp equals k — r and 7/(k) = 2+ 7'(k — 2). Since the function g(x) = vt is an 
increasing function with x, for algorithm Multi-Stage-Color the exponent a does not rise as 
rapidly as in algorithm Recursive-Color. Thus, the new approach yields better bounds. Be- 
cause Algorithm Multi-Stage-Color is slower than algorithm Recursive-Color, one can achieve 
time/performance tradeoffs by running the faster algorithm with the slower algorithm as a 
base case for some k = ko. Table 6.1 at the beginning of this chapter shows the results for 
both algorithms and for various combinations. In particular, for example, we can substitute 


the bound of Theorem 6.6 for k = 4 into the bound of Theorem 6.1 to get the following 


corollary. 


Corollary 6.7 Algorithm Recursive-Color using algorithm Multi-Stage-Color as a base case 


for k = 4, colors any n-vertex k-colorable graph (k > 4) with at most: 
O (n'- 97 (log n) #G=¥7)) 


colors. 


Proof of Theorem 6.6: 

We may assume k > 3 since otherwise, we just run Algorithm Improved-Approx in Step 
1 of Multi-Stage-Color. Define s,(n) = nse , and let a = a(k) and # = f(k). Steps 
2 and 3 of Algorithm Multi-Stage-Color establish that the graph has a minimum degree of 
n* log’ n and that no two vertices share more than s,(n) neighbors. 

Since G is k-colorable, it must contain an independent set R with Dr(V — R) > 
mdD(V — R). So, by Corollary 6.3, one of the graphs G’ = G,,;; created in Step 4 


will both have size at least: 


(dmin )” /(e(7) log’ n) 
= n?* log” n/(s,(n) log’ n), (6.8) 


3 
i 


and contain an independent set of at least a A; = ;4,(1 — 56) > (;4, — 56) fraction of its 
vertices.’ 

We now examine the call to procedure Iterate-neighbors. Suppose Iterate-neighbors is 
called with a graph G’ of at least m; vertices containing an independent set of at least a A; 
fraction of its nodes. By Theorem 6.5, if Step P3 does not halt with success immediately, 


then one of the graphs H, produced will have both a minimum degree of 6m,;n°~! log’ n 


*One can verify that the minimum degrees and the values m; defined satisfy the technical conditions of 
Corollary 6.3 (min degree > max(s(1 + 4), (3log n)/5*)) and Theorem 6.5 (m; > 1/67). 
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and contain an independent set R’ with D(R’) > (A; — 26)D(V(#H,)). Rewriting the latter 
inequality, we have D(R’) > (A; — 26)[D(V(H,) — BR’) + D(R’)], so: 


Dr(V(ihi)- R')=D(R') > AEH DVM) — PR). 


Using the minimum degree bound and degree ratios above, Corollary 6.3 implies that one of 
the sets Gi,; produced in Step P4(a) will both have size at least m,,; and an independent 


set of at least a fraction A;4, of its vertices, where: 


Misi = 64m?n?*-?(logn)?? /(s,(n) log’ n) 
= 9(m?n®*-*(log 1)? /(4(m) log! n) 
= 0(m2n?9-?/5,(n)), (for 8 =5.5) (6.9) 


and Xin > 2S - 56 


1—4,;+26 
zee ea 
> 74h — 136 for A; < 1/2. (6.10) 


Thus, one of the graphs G, returned to Step 5 of Algorithm Multi-Stage-Color will have at 
least mz_2 vertices and contain an independent set of size at least A,_2|V(G))|, where we 
must now solve for m,_, and A;_>. 

Claim 1: A; > Ay — 4°76 for O<i< k-2. 

Proof: For i = 1 the claim holds. For 7 > 1, by induction and using equation (6.10), we 


have: 


Ms > (ge — 4°78) / GR + 4778) - 136 
> (goer — 2-4°978)/( HE) — 138 
Sg eee OE 188 
> py —3-4't6 - 136 (for i < k — 2) 
> --4'76. (for i+ 1 > 2) Ss 
So, for § = 6(k) = gejggqy we have: 
Ape 2 ee (6.11) 


Claim 2: m; = Q(n(2"**-2)e ~n2-2'. [se(n)]!-2'). 
Proof: One can easily check that the claim holds for the base case of 7 = 1, using equa- 


tion (6.8) and the fact that for @ = 5.5 that log’ n > log’n. For i > 1, we can check 
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inductively that (6.9) satisfies the claim as follows: 
Miz, = O(mjn**-?/[s,(n)]) 
O(n2"*"-2)28 F n22-2') ; [s,(n)]20-2 ; n®*-?/[s,(n)]) 
= O(n PH APH —2 (nye?) 
(nen? (ny) 
So, 


mpeg = (nF *-F9 2-2 Tg (pn) *), (6.12) 


Thus, one of the graphs of Step 5 of Algorithm Multi-Stage-Color will have an independent 


1 
logn 


set of at least (5 — ) of its vertices (from equation (6.11)) and have size at least m,_2, 
as given in equation (6.12). By lemma 4.9, Step 6 will find an independent set of size at 
least m,_2/ log n. 

Thus, to prove Theorem 6.6 we must just show that m,_2/logn = Q(n/(n™ log? n)). 
Since 8(k) is set to 5.5 it is enough to have mz_2 = 2(n'~2*)). Equivalently, using equa- 
tion (6.12), taking log, of both sides, and substituting in s,(n) = nTsaes we just need to 
show that: 

1=a(k) < a(h)[2t*~2] + [2-24] 4 [IE (1-29), 
Rearranging terms, this formula is equivalent to: 
[1 —a(k)\(2*-2 1) <tr? + Ferien (1 -2*-*), 
or: 
gk-2 


k-1 1 k-2 
ah Seay one 


Dividing both sides by 2*~? and rearranging one final time, we find that we just need: 


But, this formula is exactly the definition of a(k) given in equation (6.5). So Algorithm 


Multi-Stage-Color works as claimed. 


Chapter 7 


Random models for k-colorable graphs 


While the problem of coloring worst-case k-colorable graphs seems quite difficult, it turns 
out that coloring random k-colorable graphs is much easier. In fact, it is well known by 
results of Kucera [23], Turner [38], and others that random k-colorable graphs can be k- 
colored in polynomial-time with high probability. These results show that, in fact, most 
k-colorable graphs are easy to k-color. Dyer and Frieze [18] go further and provide an 
algorithm that when amortized over all n-vertex k-colorable graphs, spends polynomial 
time on average per graph. Experimental work on various heuristics for coloring random 
k-colorable graphs has been done by Petford and Welsh [31]. 

The standard model for a random n-vertex graph is the model G(n, p) in which each 
possible edge (wu, v) is placed into the graph with probability p. This model has the property 
that the distribution G(n, 1/2) is the same as that obtained by selecting a labeled n-vertex 
graph uniformly at random from the set of all n-vertex graphs. 

There are several natural models, however, for what one means by a random k-colorable 
graph. Dyer and Frieze examine several and prove relationships among them [18]. We focus 
here on one model that happens to be simplest to analyze, which we shall denote G(n, p,k). 
A graph is selected in G(n,p,k) according to the following procedure. First each labeled 
vertex is independently assigned to one of k color classes with equal probability 1/k. Then, 
independently for each pair w, v of vertices in different color classes, the edge (u, v) is placed 


into the graph with probability p. We use the notation: 
¢ G<— G(n,p, k) 


to mean that G is selected according to the distribution defined by this model. 

The G(n, p,k) model is a natural one for a random n-vertex k-colorable graph, though 
even for p = 1/2 it is not equivalent to selecting a graph uniformly at random from the 
set of all n-vertex k-colorable graphs. In particular, graphs that can be k-colored in mul- 
tiple different ways are over-represented in G(n,1/2,k) since different assignments to the 
color classes may still lead to the same graph. (See Dyer and Frieze [18] for more on the 


relationship between the models.) 
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In this chapter, we consider the problem of k-coloring graphs in G(n,p,k) for as low an 
average edge density as possible. We present an algorithm to k-color such graphs with high 
probability for any constant k, and for p > n%)-}; that is, the procedure will work for the 
average degree as low as n‘ for any fixed « > 0. Before describing that algorithm, however, 
we point out first a quite easy method to k-color G — G(n, p,k) for p > n-1/?+ (€ > 0). 

This idea of the easier procedure is simply this: two vertices from the same color class 
in G will tend to share more neighbors in common than two vertices of different color. Two 
vertices from the same class have an expected n —n/k vertices they might potentially share 
as neighbors, and so can be expected to share n(1—1/k)p? neighbors in common. However, 
two vertices from different color classes have only an expected n — 2n/k vertices they may 
share as neighbors, and so they can be expected to share only n(1 — 2/k)p? neighbors in 
common. For p > n-!/?+*, these values are n?*(1—1/k) and n?*(1—2/k) respectively. Since 
for any given pair of vertices 2, y, the indicator random variables X, for the event that v is 
a neighbor to both z and y are mutually independent over all v (and each occurring with 
probability p? if v is a different color from both z and y), we may apply Chernoff bounds. 
In particular, if X = >> X,, and p = E[X] is the expected number of neighbors in common 
between z and y, Chernoff bounds state that for any 6 > 0, 


Pr[X <(1-6)uor X > (1+ 6)u] < 27 © 4/9 


(See Angluin and Valiant [2]). For 4p = O(n‘), this probability is so small that even when 
summed over all pairs of starting vertices x,y, the probability any pair shares a number of 
neighbors that differs by more than 6 from the expectation is o(1). 

One thus finds that with high probability, all pairs of vertices selected in the same color 
class share n*‘(1 — 1/k](1 + 0(1)) neighbors and all pairs of vertices of different color share 
only n*‘[1 — 2/k](1 + 0(1)) neighbors in common. Thus, one can easily algorithmically 


separate the color classes. 


7.1. An improved algorithm 


In this section, we describe an algorithm based on an extension of the above idea that k- 
colors graphs in G(n, p, k) for much lower values of p. The results presented here are based 
on work joint with Joel Spencer. 

Another way to view the above observation is that vertices of the same color will have 
more paths of length 2 between them than vertices of different colors. This idea can be 
extended to paths of a longer constant length / for improved bounds. If / is even, it turns 
out (see Section 7.1.1) that the expected number of paths of length / between two vertices 


of the same color is higher than the expected number for vertices of different color. If / is 


7.1. An improved algorithm 59 


odd, the reverse holds. The difficulty in analyzing the case 1 > 2, however, is that the events 
corresponding to the paths of length | between two vertices are no longer independent. 
Different paths of length 2 between two vertices x and y share no edges in common, but two 
paths of length 3 might share an edge: for example, consider the two paths (z, w, w’, y) and 
(x, w’,w,y). So, to prove that the number of paths will be with high probability close to 
the number expected, one needs a more sophisticated probabilistic analysis. Luckily, such 
analysis for a general class of this type of problem has already been provided by Spencer 
[35] in the context of the random graph model G(n, p). It turns out that the same analysis 
holds for the G(n, p, k) model as well. 


7.1.1 Calculating expectations 


Let | > 2 be some integer constant and let us fix two vertices x and y. By a “path” we 
will always mean a simple path; that is, one that never touches any vertex more than once. 
In this section, we calculate the expected number of paths of length / between z and y in 
G <— G(n, p, k) and show this expectation differs by a constant factor depending on whether 
or not z and y are in the same color class. 

In particular let E;(p) be the expected number of paths of length / between zx and y in 
G — G(n,p,k), and let E™*(p) and Et(p) be the expected number of such paths given 
that 2 and y are chosen in the same or in different color classes respectively. Also, for p > 0 
let Ax(p) = [ES*™°(p) — Es(p)]/E,(p). When p is clear from context, we will just write 
Eke, Ea and 2; for the above quantities. We show now that for constant k and 1, 
the value A; is bounded away from 0 by a constant. 

We can calculate the expected number of paths between z and y by fixing some arbitrary 
sequence of distinct vertices (also distinct from x and y) 1,,...,v;-; and calculating the 
probability of the event B, that each pair (x, v1), (v1, v2),..-, (Vi-2, 1-1), (Vi-1, Y) Consists 
of vertices chosen in different color classes. Given that the event B; occurs, the probability 
the path (r,v1,...,%-1,y) appears in G is simply p'. Given that B, does not occur, the 
probability is 0. Since there are (n — 2), = (n — 2)(n — 3)++-(n — 1) = n/“'[1 — o(1)] 
possible such sequences 2,,..., 0-1, the expectation EF; is simply [1 — o(1)|p'n'~!Pr[B)]. 

For any random variable X, let Pr**™°[X] and Pr“"[X] be the probability that event X 
occurs given that x and y are in the same color class, or given that x and y are in different 


color classes, respectively. Thus, we have: 
E, = Ex(p) = [1—0(1)]p'n''Pr[ Bi], (7.1) 


pene = EP (p) = [1 — of pint PB (72) 
EM = EMR(p) = [1 - o(1)Jp'n' Pr“ B). (7.3) 
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Also, since the p'(n — 2),;_; terms factor out of the expression for 4;, we have: 


Pr?™”*(B,] — Pr“ "[B,] 


A= A = 7.4 
i i(p) Pr[Bi] ( ) 

So, to compute E,, Ef4™*, | Oak and A;, we need only examine the fixed sequence ,..., 2-1 
and the event that all are chosen colors such that the path (z,1,..., v-1,y) is a “potential 


path” in the graph. 
The value Pr[B;] is quite easy to calculate: each vertex in the path has a (1 — ;) 
probability of being given a different color than the preceding vertex. Thus, 


Pr[B] = (1-3Y. (7.5) 


Also, clearly, Pr[B,] = Pr**™°[B,] - Pr[z and y are chosen the same color] + Pr®"[B;] - 


Pr[z and y are chosen of different color}. So, 
Pr{B] = 1Pr™(B) + (1 — 1)Pr“*[Bi). (7.6) 


So, from equations (7.5) and (7.6), in order to calculate Pr*™°[B;] and Pr“"[B,] it suffices 


to prove the following theorem. 
Theorem 7.1 Pr*™*[B,] — Pr“"[B,] = (-1)'(4)"1. 


Proof: Define the following events A; and B, for t <1. Notice that this definition of B, 


coincides with the previous definition of B, for t = I. 


e Fort < J, let A; be the event that each pair (2, 01), (v1, v2),.-+, (U:-2, U:-1) Consists of 


vertices chosen in different color classes. 


e Fort < 1, let B, be the event that A, occurs and in addition vertex v,_; is chosen in 


a different color class from y. 


Also, for convenience, let A, — B, be the event that A, occurs and PB, does not. Notice 
that since B, C A;, we have Pr[A, — B,] = Pr[A,] — Pr[B,]. Also note that event A, does 
not depend on whether z and y are chosen in the same or different color class. 

The probability of event A, is easy to calculate: we just need v, a different color from 


Z, v2 a different color from v,, and so on up to %4_;. Thus: 
Pr[A,} = (1—1/k)**'. (7.7) 


For t = 2, event B, is the event that v, is a different color from both z and y, so Pr°*"*{ Ba| = 


1—1/k and Pr“*[B,] = 1 — 2/k. For t > 2, event B, occurs if either: (1) By, occurs and 
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v1 is of one of the k — 2 colors not used in y or 14-2, or (2) event A;_; — By, occurs and 


%;,-1 is of one of the k — 1 colors not used in y or 4%4_2. Thus, 


Pr“™[B,] = Pr°™*[By_:](1 — 2/k) + (Pr[Ayi] — Pr™*[By-1])(1 — 1/k) 
= (1-1/k)*! - Pr ™[B,_4). (7.8) 
Similarly, Pré"[B,) = (1-1/k)''- }Pr“*[B,_,]. (7.9) 


Thus, we can solve for Pr**"°[B;] as follows. 


Pr“™*[B,] = (1 1/k)-? — 3 ((1 - 1/hy-? - ipr™™[B,_2)| 
= (1-1/k)o — 2(1 - 1k)? + (1 — 1/b)-9 - Pr B_s]] 


= (1-1/k)* —2(1- 1/k)-? + 5 - 1/k)'? - 
atta 2 Pe a), (7.10) 


Similarly, 
Pr“"[B) = (1—1/k)t— 1a -1/kye? +... + (- 2? (4)? Pe By). (7.1) 
From equations (7.10) and (7.11), we have: 


Pr™™°[ Bi] _ Pr“ "|B, = (-1)-7(2)'-? [Pe*™[B.] = Pr“"[By]] 
(—1)'(g)' 7 [0 — 1/k) — (1 — 2/4) 
= (-1)(3)". (7.12) 


Thus, we have proven the theorem. s 
By Theorem 7.1 and equation (7.4), we have A, = [(—1)'/k'~1]/Pr{B)], so: 
yo > nye (7.13) 


Thus, for / and k both constant, we have 4; bounded away from 0 by some constant > 0. 
Also, note by equations (7.2) and (7.3) that for p > n~‘*+* for some constant € > 0, for 
sufficiently large integer / (in fact 1 > [2/€]), we have Esme, Edit — O(n). 


7.1.2 Analysis and the Lpath algorithm 


Note the following property of paths of constant length / between fixed vertices z and y. 


The number of edges in the path divided by the number of “non-rooted” vertices (that is, 


62 Chapter 7. Random models for k-colorable graphs 


vertices not including z and y) is //(1— 1). For any proper subgraph S of such a path, the 
quantity: |E(S)|/|V(S) — {z, y}| is strictly less. Because this ratio of edges to non-rooted 
vertices is strictly less for all proper subgraphs, we say that paths between z and y are 
“strictly-balanced”. (A definition of “strictly balanced” for more general “rooted graphs” 
appears in Definition 8.3 of Chapter 8.) 

Spencer [35] proves that for any such strictly balanced graph and any constants 6,c > 0, 
if the expected number of copies of the graph in G — G(n, p) is at least K log n for sufficiently 
large K, then the actual number of copies of the graph in G — G(n, p) will be within (1+6) 
of the expectation with probability 1 — o(n~°). In Appendix B, we prove a slightly weaker 
(and simpler to prove) analog of Spencer’s theorem for the model G(n, p,k). A special case 
of the analog is the following. Let Num,(G) be the number of paths of length / between x 
and yin G. 


Corollary 7.2 (Corollary to Theorem B.2) For any constants 6,c > 0, if 1 and p are 
such that K logn < Es™e(p), Etifi(p) < n© for sufficiently large K and sufficiently small 
e*, then for G — G(n,p,k): 


1. Prem |(1 — 6)Kpam¢ < Num,(G) < (1+ 6)EP"] > 1-o(n-*), 


2: Pr“#|(1 — 6)ESf < Num,(G) < (1+ 6) Baie > l-o(n‘). 


So, if the expected number of paths is sufficiently large, but not too large, then we can be 
assured that with probability 1 — 0(n~*), the number of paths between z and y will be close 
to the expectation.! 


For convenience, since A; < 1, define constant ¢’ > 0 so that: 
E,€(n®,2n“] => ~— Esame Edi © Le log n,n]. (7.14) 


By equation (7.13), for / constant, the value |A,| is at least some constant greater than 0. 
Also, as noted at the end of Section 7.1.1, for p = n7!** for any constant € > 0, there exists 
integer J such that Efe, Eff — O(n). We want | such that E, € [n®,2n‘'] but such an 
integer 1 might not exist. We can handle this difficulty by noting the following fact. 


Fact 7.1 Let G,(n,p,k) be the model such that we first select G ~— G(n,p,k) and then delete 
each edge with probability q. Then, G,(n,p,k) = G(n, p(1— q),k). 


That is, if we delete each edge in graph G ~— G(n,p,k) with some probability qg, then the 


distribution obtained is exactly the same as if we had just put each edge into the graph with 


In fact, the restriction that EP2™M°, Edift < n© is most certainly not necessary. We leave for future work 
to show that Spencer’s theorem goes through for the expectation greater than n° as well. 
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probability p(1 — q) in the first place. So, given a graph G — G(n,p,k) and a value / such 
that E, > 2n*’, if we delete edges at random from G with probability q so that the expected 
number of paths between zx and y is between n® and 2n“, then we can apply Corollary 7.2 


to the resulting graph. We now present the algorithm /-path. 


Algorithm -path 
Given: An n-vertex k-colorable graph G. 


Output: A k-coloring of G or else failure. 


1. Let dayg be the average degree in G and let p= aD: 

(So, if G — G(n,p,k) for p = n~'** then with high probability, p = p[1 + o(1)].) 
Pick | such that p'n'-! = Q(n) and let \ = 1/k'-}. 

2. Randomly delete each edge in G with probability q so that E,(p(1 -— q)) € 
[4n‘, n°] where ¢’ is such that Corollary 7.2 holds for c = 2 and 6 = \/4, 
and E;,(p(1 — q)) is calculated using equations (7.1) and (7.5). 

Let Ef?’ = E7?™°(p(1 — q)), calculated using equations (7.2) and (7.10). 

3. Fori=1 tok do: 

(a) Pick an arbitrary uncolored vertex x and let S; be the set containing x and 


all vertices with a number of paths of length | to z in the range: 
[(1 — 4/3) Epa, (1 + A/3) Ey). 


If the set S; is not independent or S; contains previously-colored vertices, 
then halt with failure. 


(b) If S; is independent, then assign color i to all vertices of S;. 


4. If in Step 3 we assigned one of k colors to each vertex in the graph, then halt 


with success. If we did not color each vertex, then halt with failure. 


Theorem 7.3 Algorithm L-path k-colors graphs G — G(n,p,k) with high probability for 


p> n't for any constant € > 0. 


Proof: Let C,,...,C; be the sets of vertices in each color class in the creation of graph 
G in model G(n,p,k). Let us say that Step 3 succeeds in iteration i if the set 5; created 
equals C; for some 1 <j <k. 

In step 1, as noted, with high probability 6 = p{1 + o(1)], and let us for convenience 
assume now that this is the case. So, £)(p(1 — q)) = [1+ o(1)]Ei(p(1 — ¢)) and Efa™*(p(1 — 
q)) = [1+ o(1)] Ef™*(p(1 —q)). Let Epeme = E*™*(p(1—q)), let EA = EM A(p(1 —q)), and 
let Ey = Ey(p(1— q))- 
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In Step 2, if E,(p(1 — q)) € [2n, n“'], then E, € [n“,2n] and Corollary 7.2 applies. 
Thus, by Corollary 7.2 and since 6 is chosen sufficiently small so that |£9a™° — E@ifl) 5 96, 
we have the following. With probability 1 — n?[o(n-?)] = 1 — (1), for every pair of vertices 
x,y in the same color class C;, and for no pairs x,y in different color classes, the number 
of paths of length / between z and y is in the range [(1 — 6)E74™, (1+ 6)E7*™*]. Thus, 
with high probability, Step 3 succeeds for each iteration i and Algorithm Lpath k-colors the 
graph. 


Since / is a constant, counting the number of simple paths of length / between two 
vertices can be done in polynomial time and so the /-path algorithm runs in polynomial 
time. The running time of the algorithm could be improved considerably by counting non- 
simple paths as well as simple paths. It is likely that the bounds claimed by Corollary 7.2 


can be made to apply for that case as well. 


Chapter 8 


Semi-random graphs 


The results of Turner [38] and Dyer and Frieze [18] mentioned in the last chapter show 
that random k-colorable graphs, and thus most k-colorable graphs, are easy to k-color. 
Random k-colorable graphs, however, tend to be of a very special type. For instance, with 
high probability all vertices have nearly the same degree and all have nearly the same 
number of edges to each of the other (k — 1) color classes. So, graphs created in only 
a “somewhat random” manner may not be colored well by algorithms for G(n,p,k). On 
the other hand, worst-case assumptions may be overly pessimistic in many situations. To 
analyze the coloring of graphs in an intermediate range, we consider here two new graph 
models that lie in between the random and worst-case models. These new models provide a 
smooth transition between the random and worst-case scenarios and are based on a notion 
of a “semi-random source” from the cryptographic literature. We will call these models the 


“semi-random” graph models. 


8.1 Basic definitions and statement of results 


We define here two graph models both based on the semi-random source (also called a 
“slightly-random” source) of Santha and Vazirani [34] (see also [40] [39] [17]). In the first 
model, which we denote Gs(n, p,k), the graph is generated as follows. First, an adversary 
splits the n vertices into k color classes (for k = 3, we denote these classes by red, blue, and 
green). Then for each pair of vertices u,v where u and v belong to different color classes 
{running through such pairs in an order of its choosing), the adversary decides whether 
or not to include edge (u,v) in the graph. Once ‘the adversary has made a choice for a 
particular edge (u,v), the choice is then reversed with probability p. Note that later choices 
of the adversary may depend on the outcomes of earlier decisions, as in the Santha-Vazirani 
source [34]. An alternative way to view this model, and closer to the point of view used by 
Santha-Vazirani is the following. For each pair of vertices u,v belonging to different color 
classes, the adversary picks a bias py, between p and 1 — p of a coin which is then flipped 
to determine whether edge (u,v) is placed in the graph. The adversary may determine the 


bias p,, based on the outcome of previous coin tosses. The two views of the model are 
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equivalent: if the adversary in the first description is deterministic, then it can be thought 
of as selecting py, € {p,1— p}; if it is randomized, it can act as if selecting intermediate 
values. We call p the noise rate of the source and this model, the semi-random graph model. 

The second model we consider is a slightly modified version of the above, differing in 
that the sizes of the k color classes are required all to be Q(n). We call this second model the 
balanced semi-random graph model and denote it by Gsp(n, p,k). Following the notation in 


Chapter 7, we write: 
eG Gs(n, p, k) or Ge Gsp(n, p, k) 


to denote that G is selected according to the corresponding model for some unknown adver- 
sary. We denote the semi-random and balanced semi-random models for a fixed adversary 
A by Go(n, p,k) and G4,(n, p, k) respectively. Formally, we say that an algorithm t-colors 
G <— Gs(n,p,k) with high probability (or t-colors G — Gsg(n,p,k) with high probability) 
if it does so with high probability for any choice of the adversary. 

A nearly equivalent way to view the semi-random models is that each edge between 
vertices of different color classes is actually placed into the graph with probability exactly 
p, and then an adversary may elect to place additional edges into the graph if it so chooses. 
This version is perhaps conceptually the most elegant, and an adversary in this version 
can simulate the adversaries in Go(n,p,k) and Gsp(n,p,k), though the converse does not 
hold. For example, the adversary here could make the graph a complete k-partite graph if 
it so desired; also, the adversary here may make its decisions after all coin tosses have been 
performed. While this version could conceivably be more difficult for coloring algorithms 
than the semi-random graph models as defined above, all the algorithms presented in this 
chapter work under both conditions. 

The semi-random models separate the algorithms for coloring random k-colorable graphs 
into two categories. Some of the algorithms for the random model [18][23] are highly 
dependent on facts such as the edge probabilities all being equal and are easily defeated 
by a semi-random source. Others, such as Turner’s No-Choice algorithm [38] adapt well 
to the semi-random model. In particular, Turner’s bound of p > n7‘/*+© for k-coloring 
G <— G(n,p,k) holds in the balanced semi-random model as well. 

We present first in Section 8.2 an algorithm that achieves the same bound as Turner’s 
algorithm but with significantly simpler analysis (and for 3-colorable graphs holds in the 
slightly more general Gs(n,p,3) model). We then, in Sections 8.3 and 8.4, present an al- 
gorithm with better bounds for the balanced model. This algorithm 3-colors graphs in 
Gsp(n, p, 3) with high probability for p > n~°®+*, and more generally for k-colorable graphs 


works for p > nloreal+© The algorithm of Sections 8.3 and 8.4 requires a more involved 
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analysis, and the use of the Janson inequality for estimating probabilities of “almost” inde- 
pendent events. In Section 8.5 we present some relationships between the coloring problems 
in the balanced and unbalanced semi-random models. 


For convenience, we make the following definition. 


Definition 8.1 Let G — Gs(n,p,k) or G — Gsp(n,p,k). The pair (u,v) is a potential 


edge in G if u and v belong to different color classes in the adversary’s color scheme. 


For a subgraph H of G or a subset U of V(G), we will use colors(H) and colors(U) to 


denote the set of color classes of G that are represented in the subgraph or subset. 


8.2 A first algorithm 


We now consider the models Gs(n,p,3) and Gsa(n, p, 3) of a 3-colorable graph generated by 
a semi-random source. Although for small constant noise rates p, say p = 0.01, it appears 
at first that the adversary has a good deal of power to defeat a coloring algorithm, it turns 
out that it does not. As previously mentioned, Turner’s algorithm [38] actually 3-colors 
such a graph with high probability for any p > n~1/3+¢ for constant € > 0. 

We present first a different algorithm that achieves the same bound as Turner’s, but 
works for the unbalanced case Gs(n,p,3) as well and has a much simpler analysis. We 
then present a straightforward improvement and a natural extension of this algorithm for 
k-colorable graphs (for constant k) for the balanced model. 

The idea for the simplest algorithm is the following. If in the adversary’s color scheme 
u € blue and v € green, then the shared neighborhood S$ = N(u)M N(v) contains only red 
vertices. Thus, N(5) C blue U green. For p > n-1/3+* we show that with high probability, 
N(S) actually equals the entire set blue U green. So, given wu and v, one can split G into a 


2-colorable portion N(S') and an independent portion V — N(S) and thus 3-color the graph. 


Algorithm Two-Stage 
Given: A graph G = (V, F). 


Output: Lither a 3-coloring of G or failure. 


1. First try to 2-color G. If that works, halt with success. Otherwise, do the 


following: 


2. For each pair of vertices u,v (think of u as a candidate green node and v as a 


candidate blue node), 


(a) Let S = N(u)N N(v). 
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(b) Let T = N(S). 
If T is 2-colorable and V — T is an independent set, then color T blue and 
green, color V —T red and halt. Otherwise go to the top of the loop with a 


different pair u,v. 


3. If Step 2 did not succeed for any pair u,v, then halt with failure. 


Theorem 8.1 (weak version) Algorithm Two-Stage 3-colors G — Gs(n,p,3) with high 
probability (over the coin tosses of the semi-random source) for p > n7'/3+® and constant 


e>0. 


Proof: For convenience, let red be the color with the most vertices in the adversary’s 3- 
coloring. If there are either no blue or no green vertices, then we will 2-color the graph at the 
start. Otherwise, let u be a green vertex and v a blue vertex (in the adversary’s 3-coloring). 
Then, the set S = N(u)NN(v) contains only red vertices and so set T = N(S) C blueUgreen. 
We now prove that with high probability, for p > n~!/5+*, set T’ contains all the blue and 
green vertices. 

If we view the semi-random source as choosing biases py, € [p,1— pl], then the sizes 
of sets S and T are minimized when each p,, equals p. In that case, every vertex in 
red independently has a probability p? of belonging to S. So, using Chernoff bounds, 
|S| > $|red|p? = Q(np?) with high probability. Now, each vertex z € blue U green such 
that z ¢ {u,v} has a probability (1 — p)!5! of not belonging to T. The reason is that for 
z ¢ {u,v}, for each w € red, the events A,,, that edge (z,w) appears in the graph occur 
with probability p and are independent of each other and of the choice of S. So, we have: 


Pr[z ZT] < e PIS! = eM") = en) = o(1/n). 


That is, with high probability all vertices z € blue U green belong to 7’. Thus, with high 
probability, T = blueUgreen and V —T = red and so for some pair wu, v considered, algorithm 


Two-Stage succeeds. m 


Note that if the sizes of the color classes are roughly balanced, we can speed up Algorithm 
Two-Stage considerably by choosing the vertices u and v at random. For instance, if the 
sizes of the color classes are all within constant factors, then we have a constant probability 
of selecting two “good” vertices each time. 

Algorithm Two-Stage fails when p falls below n-!/3 because then the vertices u € green 
and v € blue may not share enough neighbors for N(S) to equal blue U green. However, for 
p below n-*/3, set S might still contain many vertices, and applying additional iterations 


of the neighbor-taking process can then boost its size if the sizes of the blue, green, and 


8.2. A first algorithm 69 


Figure 8.1: Vertices u and v and sets S}, and S2. 


red vertex sets are roughly balanced. In fact, we can consider the following straightforward 
extension of Algorithm Two-Stage that works in the balanced semi-random model. (See 
Figure 8.1.) 
Algorithm t-Stage 
Given: A graph G = (V, £), and integer t. 
Output: Lither a 3-coloring of G or failure. 
For each edge (u,v): 
1. Let S4 = {u}, 55 = {v}, and Sh = N(S3)M N(S}). 
2. Let S2 = N(S8)NN(S}), S32 = N(SL)NN(S1), and SZ = N(S2)N N(S2 
3. Let S32 = N(S2)NN(S2), S3 = N(S2)NN(S2), and S3 = N(S3)N N(S3). 


i Ber = NSE: y. 
If T is 2-colorable and V — T is an independent set, then color T blue and green, 
color V —T red and halt. Otherwise go to the top of the loop with a different 
edge (u,v). 


If we have not succeeded in Step t for any edge (u,v), then halt with failure. 
For the balanced model, we have the following stronger version of Theorem 8.1. 


Theorem 8.1 (strong version) Algorithm t-Stage will 3-color G — Gsp(n,p,3) with high 
probability for p > n-\/?+*, € > 0 constant, and t > log;(1/e). 
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Algorithm t-Stage is, in fact, very similar to Turner’s No-Choice algorithm, and his algo- 

rithm should achieve this stronger bound as well for Gsp(n,p,3). The algorithm presented 
here, while more complicated an algorithm, is easier to analyze. However, because we will 
demonstrate an even better algorithm in the next section, we give here just a proof sketch 
showing that for t = 3, the algorithm will 3-color G — Gsp(n,p,3) for p > n~*/"+*, and 
more briefly describe how this is extended. 
Proof sketch: Again, if u is green and v is blue then for all i, S% C green, S} C blue, 
and Si, C red, and T C blue U green. For the given value of p, with high probability there 
will exist an edge between two such vertices u and v. Also, the sizes of the sets Si,, 5%, Si, 
and T are minimized by the semi-random source that chooses each p,, to equal p. One 
final fact to note is that for each i > 1, Si Cc Si**, SE C Sit, and Si, C Sit}. The 
general argument now is just repeated application of bounds for large deviations, being 
somewhat careful about independence. For this proof sketch, we focus on the case where 
t = 3 and show that algorithm ¢-Stage will 3-color G — Gspz(n,p,3) for p = n~*/"!+© with 
high probability. Recall that for G — Gsp(n, p,3) the sizes of the sets red, blue, and green 
are all O(n). 

We can imagine that the coin deciding the presence of an edge is not flipped until we 
actually examine that edge. So, we first examine all edges (u,w) and (v,w) for w € red 
and find that almost surely |$}| = OQ(|red|p?) = O(np?). Next, for each z € green, we 
examine the edges (z, w) for w € Sp and the edge (z,v). For z # u, these are all previously 


unexamined edges. So, for z € green — {u} we have: 


Pr[z € $2] = p(1—(1—p)!S*!) 
= p*|$h|(1 + 0(1)). (using p|S}| = o(1)) 


Thus with high probability, |S2] = O(|green|np*) = O(n?p*) and similarly we have |.$2| = 
O(n?p*). Now, for each z € red—S}, we examine the edges (z, w) and (z, w’) for w € 52-—S% 
and w’ € S2 — Sj}. Again, these are all previously unexamined edges, so the same argument 
as above shows that the probability z belongs to $2 is proportional to p?|$2 — $¢| |S? — Spl. 
Thus with high probability, |S} — Sp| = O(n°p’°). Finally, we have T = N(S}). Notice 
that set T contains $2 US} and that for each vertex z € (blue U green) — ($2 U S32), we 


have not yet examined the edges (z, w) for w € $2, — Sh. So, for each such vertex z, 


Priz¢T] < (1—p)!S-Sal 
Spy er 


§,11 
e O(n Pp ) 


IA 


—5/llte 


o(1/n), forp=n 
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So, with high probability, T = blue U green. 

More generally, for p = n~1/?+€, if |S%,| = O(n*‘p?*') we will have with high probability 
that |Sit!| = O(n?+! p?**+) for 2:4, = 32; +2, so long as n*‘+'p**+* = o(n). Since we begin 
with |$}] = O(n‘) and at each step the size of Si, more than triples, we can continue with 
the assumption that n*+1 p?**+1 = o(n) for at most log,(1/e) iterations. Suppose z and p are 
such that n7+'p?*'+1 4 o(n). Since the probability that Algorithm t-Stage succeeds can only 
increase with larger p, we may for purposes of analysis decrease p so that n™+'p?*i#1 = /n. 
We will thus have for z € (blue U green) — (Si, U Si) that: Pr[z ¢ T] < (1 — p)%™ < 
e~ VP) = o(1/n). So, set T equals blue U green with high probability. m= 


It is interesting to note that algorithm Two-Stage (or ¢-Stage) extends naturally to 
graphs in Gsa(n,p,k) for constant k > 3. The idea is that instead of selecting two vertices 
u,v at the start, to select k — 2 sets: U2,...,U,_1, each U; of i vertices, at the start. 
For some such choice, the vertices of U,_, are all of different colors in colors(G) and the 
vertices of U;,_» are all of different colors in colors(U;,_,) and so forth. (That is, the vertices 
of U;_, are all of different colors, and each is of a color used in U;.) So, for T, = V(G) and 
U; = {ul,..., ui}, for each i € {2,...,k} simply let 


Tis = Nr(No(ut) ...9 Nx,(ut)), 


where N7,(X) = N(X)NT;. With high probability, for p > n-'/*+*, we will be able to 
assign one color to each set T; — T;_, for t > 3 and two colors to the set T>, and thus k-color 
the graph. This yields the same bounds as those achieved by Turner. Again, we will not go 
into the analysis in detail because in the next section, we show how a quite different idea 


can be used to get even better bounds. 


8.3 A better algorithm: k = 3 


We now describe a different style of algorithm that improves upon the above bound in 
the balanced case, to 3-color graphs Gsa(n,p,3) with high probability for p > n~°®t*. 
The algorithm, while quite simple, requires a more involved probabilistic analysis than 
the previous one. In particular, we will need to use the Janson inequality [11] to bound 
probabilities of “nearly” independent events based on pairwise dependencies.’ 

The algorithm is based on the following simple observation. If in a 3-colorable graph G 
there are two vertices x and y both adjacent to a pair of vertices u and v that are adjacent 


to each other, then z and y must be the same color in any legal 3-coloring. We call the 


subgraph induced by {z, u,v, y} a link between x and y. (See Figure 8.2). 


1The results described in this section and Section 8.4 are based on work joint with Joel Spencer. 
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Figure 8.2: A link between z and y. 


At first glance, it would seem the above observation does not help, since for fixed vertices 
x and y, the probability there exists a link between z and y is at most O(n?p*). (There 
are O(n”) possible pairs (u,v) and for each pair the probability all necessary edges exist 
is p>.) Thus, the probability there is a link between z and y is much less than 1 even for 
p=o(n°*). 

The key fact to note, however, is that we do not need a link between every pair of, say, 
red vertices z and y. All we need is that for each such pair there is a sequence of links 
between z and some z’, between x’ and some 2”, and so forth, until eventually at some 
point we reach y. We will call such a sequence a “chain”. 

Another way to think of this observation is that given a graph G we can create a new 
graph H as follows. The vertex set V(H) equals V(G), and if z and y are connected by a 
link in G, we put an edge between x and y into H. So, while edges in G exist only between 
vertices of different color, edges in H exist only between vertices of the same color (in G). 
The “key observation” is then just that in order to easily select the set of red vertices in G, 
we do not need red to be a clique in H, just a connected set. So, the simple algorithm is as 


follows. 


Algorithm Chain 
Given: A graph G=(V,E£). 


Output: A 3-coloring of G or failure. 


1. Create graph H = (V,F), where 


F = {(2,y)| 4a link in G between x and y}. 


2. Find all connected components in H. If there are exactly three, halt with success, 
producing as output the vertices of the three components labeled as red, blue, 


and green. 


Otherwise, if there are more than 3 components, then halt with failure. 
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8.3.1 Motivation 


As mentioned above, each connected component in graph H produced by Algorithm Chain 
consists of vertices that must be the same color under any legal 3-coloring of G. The 
following sections contain a proof that when p > n~°***, with high probability there will 
be only 3 such components in H. Let us first, however, give a motivational argument, 
supposing that each edge between two vertices of the same color were placed independently 
with the same probability into H. 

Let p = n~°+* and for simplicity, assume that « < 1/5. Given two vertices 2 and 
y of the same color (say red) in G, the expected number of links between z and y in G 
= O(n?p>) = O(n-!***). For € < 1/5, the probability there exists a link between z and y, 
and thus the probability that x and y are are neighbors in H is O(n7'*5*) as well. Thus, if 
we consider the subgraph in H induced by the set red, the average degree of each vertex is 
O(n**). It is well known that in the random graph model G(n,p), once the average degree 
exceeds K log n for sufficiently large K, the graph is connected with high probability. So, 
if the edges in the red subgraph of H were placed randomly, the red set would be a single 


connected component almost surely since n** > K log n. 


8.3.2 Janson’s inequality 


Janson’s inequality [11] is used in the following setting. Consider a universe U of points 
and a collection of subsets X;,...,Xm of U. We now create a new subset S of U by placing 
each j € U into S independently with probability p. Let A; be the event that X; C S. 
Janson’s inequality bounds the probability that no set X; is contained within S: that is, 
the probability that no event A, occurs. ? 


Define: 


eM= [[PrtAi. 


i=l 
If the X; were all disjoint, then the events A; would all be independent and so M would 
be the probability that no A; occurs. If the X; are not disjoint, then the events A; are 
not independent. However, Janson’s inequality allows us to bound the probability no_X; is 
contained in S by looking only at pairwise dependencies. In particular, Janson’s inequality 
states that: 

M < Pr{no_X; is contained in S] < Me™*? (8.1) 


where 4 is an upper bound on Pr[A,] and we define: 


?Janson’s inequality works even if the probabilities for each point 7 are different, so long as the points 
are placed into S independently. We will not need this fact for our purposes. 
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A= > Pr[A; and A,]. 
ordered pairs (i#;) 
XiNX; FE 

Notice that if \ < 1/2 and A = o(1), then by equation (8.1), Pr[no X; is contained in S$] 
= M(1+40(1)). That is, under these two conditions, the probability is within 1 + o(1) of 
what the probability would be had the A; been independent. 

In the study of random graphs Janson’s inequality is often used to show that some 
structure exists with high probability. For example, suppose one wishes to prove that the 
graph G ~ G(n,p) contains a triangle with high probability for p >> 1/n. For such a 
setting, we let U be the set of all edges of the n-clique K,, (thought of as possible edges 
in G) and have one X; for each set of three edges corresponding to a triangle. Janson’s 
inequality then provides an upper bound on the probability that no triangle is contained in 
G. In the setting of this thesis, we will use Janson’s inequality to prove that in the balanced 
semi-random model, for sufficiently large noise rate p, for any z,y € red there will be a 
chain between x and y with probability at least 1 — o(n~?). 


The following definitions are taken (roughly) from Spencer [35]. * 


Definition 8.2 Let H be a graph in which some subset R of its vertices are specified to be 
“roots” and H has no edges between vertices in R. We will call the pair (R, H) a rooted 
graph, or simply say that H is a rooted graph when R is implicit. Define edges(H) to be 
the total number of edges in H and nonroots(#) to be the number of vertices in H excluding 
roots. Define the density of H to be dens(H) = edges(H)/nonroots(H). 


We will always consider rooted graphs to be graphs on a constant number of vertices, and 


examine the number of copies of such graphs in larger n-vertex graphs. 


Definition 8.3 A rooted graph (R, H) is strictly balanced if for some constant ¢' > 0, 
for every proper subgraph (R, H'), we have dens(H') < dens(H)—¢'. (By a proper subgraph, 
we mean that H' C H.) 


Definition 8.4 Suppose (R, H) is a rooted graph and G = (V,E) is a graph with V D R. 
An image of H over R in G is a subgraph of G isomorphic to H by a map which is the 


i 


identity on R. When R is clear from contezt, we will drop the phrase “over R”. 


So, for example, if H is a triangle with a root vertex z, then the images of H over {zx} in 


G are all triangles in G containing vertex z. 


3 . : . ° : 
The term “image” used here is essentially the same as an “extension” in Spencer’s paper, except he 
counts maps while we count images of maps. 
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Definition 8.5 Suppose (R, H) is some strictly balanced rooted graph and V is a set of n 
vertices containing R. Let X,,...,Xm denote all distinct images of H in the clique K,, on 
V . That is, the X; are all possible images of H fixing root set R in an n-vertez graph. For 
some model M (such as G(n,p) or Gsp(n,p,k)), we define: 


A(H, M) = y Pr[X; CG and X; CG|G «= M}. 
ordered pairs i#j 
E(Xi)NE(X;)#¢ 


Spencer [36][35] proves the following theorem for random graphs G(n, p). 


Theorem 8.2 (Spencer) Let (R,H) be a strictly balanced rooted graph on a constant 
number of vertices with v = nonroots(H) and e = edges(H). Then there exists e* > 0 so 


that ifp<n-v/et®, then A(H,G(n,p)) = o(1). 


Spencer then uses this fact to prove that with high probability, for p = n-v/et®, G will 
contain some image of H. 


We can use Spencer’s theorem to prove the following. 


Theorem 8.3 Let Gé,(n,p,k) be the semi-random model with an adversary that always 
elects not to place edges into the graph. Let (R,H) be a strictly balanced rooted graph on 
a constant number of vertices with v = nonroots(H) and e = edges(H). Then there ezists 
€* > 0 so that if p= n-v/*t", then A(H, Gag(n, p, k)) = 0(1). 


Proof: Theorem 8.3 follows immediately from Spencer’s theorem (Theorem 8.2). Let 
X,,...,Xm denote the images of H in the clique K,, and let A; be the event that X; C G. 
Each edge (2, y) is placed into G with probability at most p (either probability 0 if z and y 
are in the same color class or else probability p if they are in different color classes). Thus, 


for any pair of events A;, Aj, we have: 
Pr[A; A A; | G — Gop(n, p,k)] < Pr[A; A A; | G — G(n, p)}. 


For sake of completeness, however, we provide a direct proof here as well following the 
argument of Spencer [35]. 

We prove the theorem by considering separately for each fixed value of s € {1,...,v}, 
the pairs X;,X,; that share s vertices in common in addition to the roots. Note that if 
s = 0, then the graphs X; and X; share no edges and so are not counted in the summation 
in Definition 8.5. The number of pairs X; and X; sharing s vertices in common in addition 
to the roots is O(n?¥~*) since there are 2v — s different vertices and only a constant number 


of permutations. 
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Let «’ > 0 be a value such that every proper subgraph of H containing the roots has 
density at most dens(H) — ¢ (see the definition of “strictly balanced”). If s = v, then 
since X; and X; are distinct, there must be at least e + 1 edges in X; U X;. For any fixed 
edge, the probability that edge belongs to G is at most p (it could be smaller, e.g. 0, if 
the two endpoints are in the same color class in the graph). So, the contribution to the 
summation from such pairs X; and X, is at most: O(n¥p*t!) = O(nvtetD(-viete)) = 
O(n-v/et(e+e") — 0(1), for &* sufficiently small. 

Now consider s < v and fix a pair X; and X; sharing s vertices in common in addition 
to roots. Let S be the subgraph X; X;; that is, V(S) = V(X;) NM V(X;) and E(S) = 
E(X;) N E(X;). Since (R, H) is strictly balanced, we know that |E(S)|/s < e/v — é for 
some €' > 0. So, |E(X;) U E(X;)| = 2e — |E(S)| > 2e — se/v + se’ and thus the probability 
that both X; and X; are subgraphs of G is at most pre-selvtse' 

Finally, summing over all O(n?”~‘) pairs X; and X; sharing s vertices in common besides 


the endpoints, the contribution to A is at most: 

OG presets) = O(n (navlere emtels tee) 
ae Ogee er et easter) 
== 6) aa a eae 


= o(1) (for e* sufficiently small). 


Thus, the contribution to A from each value of s € [1, v] is o(1), and since there are only a 


constant number of different choices for s, we have A(H,Gé,(n,p,k)) = 0(1). = 


We will use this fact in the next section to prove that in balanced semi-random 3- 
colorable graphs, with high probability there will exist chains between every pair of vertices 


x and y in the same color class. 


8.3.3 The main theorem 
We now prove the following theorem. 


Theorem 8.4 Algorithm Chain will 3-color G — Gsp(n,p,3) with high probability for p > 


n-3/5+€ for any constant € > 0. 


The idea of the proof is to consider chains of some length r between two fixed endpoints 
(roots) z and y and to prove that with probability 1 — o(n~*) at least one such chain exists 
in G. This will be done by showing that chains are strictly balanced and then applying 
Theorem 8.3 and Janson’s inequality. 

Before proving Theorem 8.4, however, let us first formally define a chain and prove a 


few preliminary lemmas. 
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Uo Uy Ups 
Wo ins 
U9 Vy Ory 
fa 
Figure 8.3: A chain of length r between wo and w,. 


Definition 8.6 A chain C of length r is a rooted graph on 3r+1 vertices {wo, Uo, Vo, Wi, U1, U1; 
wey Wroty Up—1y Ur_1, We} and 5r edges, where: 


r—1 


E(C) > U {(wi, ui), (w;, v%), (u;, v%), (ui, Wi41)s (%, Wi+1)}- 


#=0 
(See Figure 8.3). Vertices wo and w, are the roots of the chain and will be called the 


endpoints. 


Definition 8.7 If G — Gsz(n,p, 3), we will say that C is a potential chain between two 
vertices Wo and w, if all w; are in the same color class, and for each i, vertices u;, v;, and 


w,; are all in different color classes. 


Note that nonroots(C’) = 3r — 1 and edges(C’) = 5r, and there are no edges in C’ between 
the roots. Also, note that the ratio: ~—nonroots(C)/edges(C) = —3/5+ 3, so if p = 
n-3/5+€ as in the bound for Theorem 8.4, then for C a sufficiently long chain we will have 
p= nh 
Theorem 8.3 for proving that A = o(1). Our immediate goal is thus to prove that chains 


—nonroots(C)/edges(C)+¢" for some e* > 0. This is the form of the condition on p in 


are strictly balanced. 


Fact 8.8 If C is a chain of length r, then dens(C) = —4#&49- — 5/3 4 Cr 


nonroots(C) 
The following is a useful fact about subgraphs of chains. 
Lemma 8.5 Let S be a subgraph of a chain C. Then |E(S)| < 5/3(|V(S)| - 1). 


Proof: Let C have vertex set {wo, Uo, Vo,---+) Wr—1, Ur—1, Ur—1, Wr}. Let DL; be the ith 
link in C; that is, ZL; = Cl{w,,u.,v,,wigi}) For convenience, partition the vertices of S into 
disjoint sets V; = V(S)N[V(L;) — {wi4:}], and partition the edges of S into disjoint sets 
E; = E(S)n E(L,), for0 <i< r. So, S = (UVi,U FE). 

For a given index i, if V; is not empty and w;4; € V(S), then |F,|/|V;| < 5/3. One can 


easily check that the maximum value of this ratio occurs when FE; and V; are both “full” 
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(sizes 5 and 3 respectively). If V; is non-empty but wi4, ¢ V(S), then: if |V;| = 3 we have 
|E;| < 3, if |V;| = 2 we have |F,| < 1, and if |V;| = 1 we have |F;| = 0. 
Since there must exist some i such that V; is non-empty and w;4, ¢ V(S), this implies: 
IE(S)| < max{ 5/3(|V(S)| -3)+3, 5/3(/V(5)|—2) +1, 5/3(|V(S)|— 1) +0} 
5/3(|V(S)|-1). = 


We can now use Lemma 8.5 to prove that chains are strictly-balanced, and so allow easy 


application of Janson’s inequality. 


Lemma 8.6 Let S be a subgraph of a chain C' of some constant length r such that V(S) 
contains the endpoints wo and w, but V(S) does not contain all the vertices of C. Then, 
for some constant ¢ = €'(r) > 0, 
edges(S') a edges(C') se, 
nonroots(S) — nonroots(C’) 


That is, chains are strictly-balanced. 


Proof: Since we are giving an upper bound on the number of edges in S, we may as well 
assume that S is a vertex-induced subgraph. 

First, suppose S consists of just one connected component. So, S contains vertices 
{wo,w1,...,wW,} and at least one of {u;,v;} for each « < r. Thus, for each vertex in C 
missing from S$, there must be at least 3 edges of C missing from S as well: if u; ¢ V(S) 


then (w,;, u;), (ui, %), (ui, Wi41) ¢ E(S). So, if m vertices of C are missing from S, then: 


edges(S) 2 edges(C’) — 3m 


nonroots(S) ~ nonroots(C)—m 
d 
Bc (cl 9 ae é for some ¢€’ > 0, 
nonroots(C) 


because eto, <3 and both edges(C’) and nonroots(C) are constant. 

If S consists of more than one connected component, then wy and w, cannot be in the 
same component since we have assumed that S$ is vertex-induced. We can thus partition $ 
into two disjoint subgraphs: Sy,4,¢ and S,e5; where Sy:a,; is the component containing wo 
and Syes; is everything else (and need not be connected). So, nonroots(S) = |V(S)|- 2 = 


lV (Sstart)| + |[V(S-est}| — 2. Applying Lemma 8.5, we get: 
edges(S) = |E(Ssetart)| 7 |E(Srest)| 
S  5/3(/V(Sstare)| — 1) + 5/3(V(Srest)| — 1) 
= 5/3({V(5)| — 2) 
= (5/3)nonroots(S). 
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dges(S) dges(C) _ 5 
So, eons) = maeou(G) —€ for € = 3 nonroots(C)? by Fact (8.8). 7. 


Proof of Theorem 8.4: First, it is clear from the description of Algorithm Chain 
that adding additional edges into the graph G cannot decrease the probability of success. 
Therefore, in order to prove Theorem 8.4, it is enough to prove that Algorithm Chain 
succeeds when each potential edge is placed into the graph with probability exactly p; that 
is, the adversary A always chooses not to place edges into the graph. It is similarly also 


enough to prove the theorem when p exactly equals n~9/5+* for some constant € > 0. Let 
g Pp y eq 
ré€Z, €>0 be constants such that p= n73/5tsrté, (8.2) 


By Lemma 8.6, chains C’, of length r are strictly balanced. So, by Theorem 8.3, there exists 
€* > O such that if p < n-nenmots(C)/edges(C) +e" then A = A(C,, G4y(n, p, 3)) = o(1). Because 
additional edges cannot decrease the probability of success, we may for the purposes of 


analysis assume that: 
é< é&. (8.3) 


Thus, since ae ca = tl = -3/5+ 2, we have by Theorem 8.3 that: 


A = ofl). (8.4) 


Fix two points z and y in the same color class in G; without loss of generality, say 
z,y € red. We now show that with probability 1 — o(n~?), x and y are connected by a 
chain of length r, with r as in equation (8.2). This will immediately imply Algorithm Chain 
succeeds. 

In fact, the analysis we provide to show that with high probability there is a chain 
between x and y, more generally holds in the random model G(n, p) for any strictly balanced 
rooted graph H where p > n~nonrects(H)/edges(H)+e" This general fact is proven by Spencer 
[36][35]. The proof of Spencer’s theorem can be seen to hold in the Gsg(n,p,k) model as 
well, so long as the number of images of H containing no edges between vertices in the same 
color class is O(n°"°°s(#)), This is the case for chains, but is not the case, for example, for 
non-k-colorable graphs. For completeness, we provide a direct proof for chains along the 
lines of Spencer’s arguments here. 

Label each potential chain of length r between xz and y as some C; for (1 < i < m), where 
the number of such potential chains is: m = 2"(O(n))*"~1(1 — 0(1)) since there are two color 
choices for each (u,v) pair, there are O(n) vertices in each color class in Gsg(n, p,3), and r 
is constant. Thus, m = O(n*"~1). We bound the probability that z and y are not connected 
by a chain of length r by applying Janson’s inequality. The universe U corresponds to the 
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set of O(n”) potential edges in G and the sets X; correspond to the potential chains C; of 
length r between z and y, where X; is the set of all edges in C;. Every C; has 5r edges, 
each in G with probability p. So, 


Il Pr{C; Z G] 


t=1 


a _ pooner) 


M 


en Pr -a(nFr— 1 ) 


gaan Tire eeee= ty 


=: 5ré 
e108) 


= o0(1/n?). (8.5) 


Let us now consider the term eT-*? in Janson’s inequality. For a fixed potential chain C;, 
the value A = Pr[C; C G] = p®* = o(1). By our choice of €, we have A = o(1) as well 
(equation 8.4). Thus, e=*? = 1 + o(1). 

We now apply Janson’s inequality using the bound on the above term together with 
the bound on M in equation (8.5). We thus get that Pr[z and y are not connected by a 
chain of length r in G] = Me™=*? = M(1 + o(1)), which equals o(1/n?). So, with high 
probability, all pairs of vertices from the same color class are connected by some such chain 


and Algorithm Chain succeeds. um 


8.4 A better algorithm: general k 


We can extend the results of the previous section to graphs of higher chromatic number k. 
A simple way to do this is just to replace the notion of a “link” by that of a “t-link” defined 


as follows. 


Definition 8.9 A t-link for some constant t is a (t + 2)-vertex graph consisting of two 


vertices z and y called the endpoints both fully connected to a t-clique. (See Figure 8.4). 


Equivalently, a t-link between x and y is a (¢ + 2)-clique with the edge (z, y) removed. 
Note that if two vertices in a k-colorable graph are endpoints of a (k—1)-link, then they 
must be the same color in any legal k-coloring. Using this fact, we can get the following 
natural generalization of Algorithm Chain to graphs of constant chromatic number k > 3. 
Algorithm t-Chain 
Given: G =(V,£), a k-colorable graph. 


Output: A k-coloring of G or else failure. 
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Figure 8.4: A 4-link between z and y. 


p value (fraction) | n~9/5— n-4/9 yn -5/14 8/20 
(decimal) | n-°8 70444-0987 70.8 


Table 8.1: Algorithm ¢t-Chain succeeds with high probability for p at least this value 


times n‘. 


1. Create graph H = (V,F), where 


F = {(2,y)| da (k—1)-link in G between x and y}. 


2. Find all connected components in H. If there are exactly k components, then 
halt with success, producing those components as the color classes of G. 


Otherwise, if there are more than k components, then halt with failure. 


Theorem 8.7 Algorithm t-Chain k-colors G — Gsp(n,p,k) for p > plates], (e > 0) 
with high probability. (See Table 8.1). 


In order to prove Theorem 8.7, we consider (k —1)-chains of some constant length r. We 
then prove, analogously to the previous section, that (kK — 1)-chains are strictly-balanced, 


so Theorem 8.3 applies. We define a t-chain as follows. 


Definition 8.10 A ¢t-chain of length r is a sequence of r t-links connected at their end- 
points. For a t-chain C with fixed endpoints x and y, we will treat the chain as a rooted 


graph, with x and y as the roots. 


Fact 8.11 IfC is at-chain of length r, then |V(C)| = r(t+1)+1, nonroots(C’) = r(t+1)-1, 
and |E(C)| = edges(C’) = r [("$?) — 1] = §(t+ 1)¢+2)-2]. So, WoPh = alt+2-Al= 


2 t+1 
t t 
2 + t+1° 


Note that if C’, is a (kK — 1)-chain of length r, then the term 
of Theorem 8.7 equals lim,—,.9 —2mmcts(en) | 


edges(C,) 


aad in the statement 


82 Chapter 8. Semi-random graphs 


As in the proof of Theorem 8.4, the first step is to prove that ¢-chains are strictly- 


balanced. We first prove a fact analogous to Lemma 8.5. 


Lemma 8.8 Let S be a subgraph of a t-chain C’ of length r. Then: 
JE(S)i < [$+ 4] (IV(5)| - 2). 


Proof: 
Let L be a t-link and let g(t) = § + 747. Define dens,(H) = ere so: 


dens,(L) = dens,(C) = g(t). (8.6) 


Claim 1: If S C L, then dens,(S) < g(t). 

Proof of 1: We may assume S' is vertex-induced. Thus, S is either a (t + 2 — c)- 
clique or else S is a (t — c)-link for some c > 1. In the latter case, the claim follows 
from equation (8.6) since g(t) is an increasing function of t. In the former case, we have 
dens,(S) = 43 =£41-£< g(t) fore >1.0 

Now, we prove the lemma that if S C C then dens,(S) < g(t) by induction on the 
length of C. The base case is proved in the above claim, so we may assume the lemma 
holds for any ¢t-chain of length r — 1. Let C’ be the t-chain of length r — 1 consisting of 
the first r — 1 links in C and let S’ be S restricted to C’. So, dens;(S’) < g(t). Let L 
be the last link in C and let S; be S restricted to L. So, |E(S)| = |E(S‘)| + |£(S,)| and 
lV(S)| > |V(S)| + |V(Sz)| — 1 (note: S’ and S; may share one vertex in common where L 
joins C’). Thus, dens,(S) < of eal = AAR EtT zee ea < g(t) by induction and 
Claim 1. m9 


Lemma 8.9 [et S be a subgraph of a t-chain C of some constant length r such that V(S) 
contains the endpoints x and y but V(S) # V(C). Then, for some constant ¢ = €'(r) > 0, 


edges(S) Ze edges(C’) ' 
nonroots(S') — nonroots(C) ‘i 


That is, t-chains are strictly-balanced. 


Proof: For C a t-chain of length r, we have edges(C’) = $[(¢+1)(¢+2)—2] = 5[t?+3#]. 


2 
So, we may upper bound the density as follows: 


E[t?43t 
dens(C) = Je 


t?43t 
2t+2-2/r 
t43 
2 


IA 
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Again, we may assume that S is a vertex-induced subgraph. 

First, suppose S$ consists of just one connected component. So, S contains at least one 
non-endpoint for each t-link Z in C, and contains all link endpoints. Let us focus on some 
fixed t-link Z in C. If S is missing m vertices from that link, then it must be missing at 
least ({+1)+¢+(t-—1)+...+(¢—m + 2) edges from that link as well. So, the ratio 
of (edges missing) to (vertices missing) is at least Wuieom+?) > 14, form <t-—1, the 
largest value of m possible. Thus, if there are in total m vertices in C missing from S, then 

edges(S$) ns edges(C) — m(t + 4)/2 
nonroots(S) — nonroots(C) — mm 
edges(C’) ? 


Ee for some ¢’ > 0 
nonroots(C) ; 


because ‘t4 > 5 + dens(C). 

If S consists of more than one connected component, then the endpoints of C cannot 
be in the same component since we have assumed that S is vertex-induced. We can thus 
partition S into two disjoint subgraphs: Syiar_ and Ses, where Syiar¢ Contains root x and 
Srest is everything else. So, nonroots(S) = |V(Ssrart)| + |[V(Srest)| — 2. Let g(t) = $ + nee 
Applying Lemma 8.8, we get: 


edges(S) IE(Sstart)| + |E(Srest)| 
9(t)(|V(Setare)| — 1) + 9(t)(V(Srese)l — 1) 


g(t)nonroots(S$). 


IA 


edges(S) [E(C)| _ edges(C) edges(C) ot , 
So, nonroots(S) < lV(cy-1 nonroots(C)+1 < ponroots(C) € for €> 0. | 


Proof of Theorem 8.7: As in the proof of Theorem 8.4, we may assume that the 
adversary A always elects not to place edges into the graph. Let C = C(r) be a (kK—1)-chain 


of length r between two fixed vertices x and y. So: 


~nonroots(C’)/edges(C) = —(kr—1)/ (§[k(k +1) — 2]) (see Fact 8.11) 
—2k 2 
K(k+1)—2 r[k(R+1)—2)° 


Thus, for sufficiently large r, for some € > 0, we have p > n ~Ronrocts(C)/edges(C)+é By 
Lemma 8.9 we know C is strictly balanced, so let €* > 0 be the constant of Theorem 8.3 
such that for p < n~nomorts(C)/edges(C)+e" | we have A = A(C,G4,(n, p,k)) = o(1). Because 
additional edges cannot decrease the probability of success, we may for the purposes of 


analysis assume that € < «*. That is, 


— nonroots(C)/edges(C)+é 


p=n for some r€ Z, 0 < é€< e*. (8.7) 
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We now examine all potential (k — 1)-chains C; of length r between at z and y. That is, 
all images over {z, y} in K, of C, such that the image contains no edges between two vertices 
in the same color class in the adversary’s k-coloring. Since in the balanced semi-random 
model there are O(n) vertices in each color class and since r is constant, the number of 
potential (k — 1)-chains is m = O(n°"™°*(©)), Because each edge in some such C; is placed 


into G with probability p, for any given C; we have 


Pr{C; G G] = prdse(C) = na nonrocts(C)-+edges(C )é 


So: 


x 
I 


( f=: yc ORR) Cea Je Unter on 2) 


ll 


—n(—nonroots(C )+edges(C )é) @(_, nonroots(c) ) 


IA 


e 


67 O(neF8e)E) 


o(n~*). 


Since AX = Pr[C; C G] = ps) = 0(1) and A = o(1), we have by Janson’s inequality 
that: Pr[z and y are not connected by a (k — 1)-chain of length r in G] = Me™? = 
M(1 + 0(1)) = o(1/n?). So, with high probability, all pairs of vertices from the same color 


class are connected by some such chain and Algorithm ¢-Chain succeeds.  m 


8.5 Relating the balanced and unbalanced models 


For graphs of chromatic number 3, we had fairly good performance bounds even in the 
unbalanced model. However, for graphs of higher chromatic number, the algorithms required 
the number of vertices in each color class to be roughly balanced. The reason that the 
unbalanced case is harder is that if a color class is very small, then the noise rate p as a 
function of the number of vertices is dramatically lower. So, if (k — 1) color classes each are 
small, the algorithm is essentially required to solve a problem for a much lower noise rate 
on the (k — 1)-chromatic graph defined by those colors. In particular, one gets the following 


theorem. 


Theorem 8.10 If BPP 2 NP, then for k > 4 there is no polynomial-time algorithm for 
k-coloring graphs in Gs(n,p,k) with high probability, for p = n-* for any constant € > 0. 


Proof: Suppose otherwise; that is, there exists an algorithm B for k-coloring graphs 


in Gs(n,p,k) for p = n~‘ for some constant € > 0 where k > 4. We show how to use B to 
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optimally color an arbitrary (k — 1)-colorable graph in probabilistic polynomial time. Note 
that for k > 4, the problem of optimally coloring (k — 1)-colorable graphs is NP-hard. 

Let G = (V, E) be a (k — 1)-colorable graph on n-vertices. We create a new N-vertex 
graph H = (V UV’,F) where V’ is a set of vertices of size n°/* disjoint from V, and 
N = n+n?/¢ as follows. For each pair u,v € V, if (u,v) € E then let (u,v) be an edge in 
F as well. Also, independently for each pair v € V and v’ € V’, let (v,v’) be an edge of 
F with probability 1 — p for p= N~‘. Note that there are no edges in F' between vertices 
in V’. Now, feed graph H to algorithm B. If B k-colors H, then with high probability it 
must assign at most k — 1 colors to V and therefore (k — 1)-color G. The reason is that 
otherwise there are k vertices in V all given different colors by B, and with probability 
(1 - p)t > 1-— kp = 1 — o(1), any given vertex in V’ is connected to all k of them (and 
in fact with extremely high probability, there will be some such vertex in V’). This forces 
(k + 1) colors to be used in H. 

The main point of the proof is that an adversary with noise rate p= N~‘ can create a 
k-colorable graph on N vertices in a distribution indistinguishable from that used to create 
H. In particular, as above, the adversary separates the N vertices into one set V of n 
vertices and k — 1 colors, and one set V’ of N — n vertices and one color. It then attempts 
to places edges between vertices in V exactly where they appear in the graph G and to 
put in all edges between V and V’. Since n is so small (less that N‘/?), there are at most 
N?«/3 potential edges in the set V. So for p = N~‘, with probability at least 1 — N~-‘/> the 
adversary will be able to place exactly the edges it wishes between vertices in V without 
any noise at all. Thus, since we assumed that B can k-color graphs created by such an 
adversary with high probability, then B must k-color graph H with high probability as well. 


In the balanced model, our best bound for 3-coloring G — Gsa(n, p,3) with high prob- 


~96+e° For the random model, we were able to 3-color for p as low as 


ability is p > n 
n+, We leave as an open problem whether one can achieve such a low bound on p for 


the semi-random model as well. 


Chapter 9 


Lower bounds for independent set approximation 
based on approximate graph coloring 


In this section, we describe a lower bound for independent set approximation (or equiv- 
alently clique approximation) based on assumptions about the hardness of approximate 
graph coloring. Thought of in contrapositive form, we show how to get very good bounds 
for approximate graph coloring if we are given seemingly weak algorithms for approximating 
the maximum independent set in a graph. These results are corollaries to a basic technique 
of Berman and Schnitger [7] which they use to provide weaker lower bounds for independent 
set approximation based on other hard problems. 

Let is(G) denote the size of the largest independent set in graph G. For the Independent 
Set problem, we define the performance guarantee of an algorithm to be the worst-case ratio 
over all graphs G on n vertices, of is(G) to the size of the independent set found (with high 
probability if the algorithm is randomized) [12]. So, the lower the performance guarantee, 
the better the algorithm. The best performance guarantee known for a polynomial-time 
algorithm for Independent Set is O(n/(logn)?) by Boppana and Halldorsson [12]. 

What we show in this chapter is that if there exists an polynomial-time algorithm 
with performance guarantee O(n'~‘) for Independent Set, then there is a polynomial-time 
algorithm to color k-colorable graphs with O(log) colors and to color (log n)-colorable 
graphs with polylog(n)-colors. The best algorithm known to date [22] for coloring (log n)- 
colorable graphs uses more than n/(log n)? colors. The best algorithm known for 3-colorable 
graphs (see Chapters 4 and 5 of this thesis) uses O(n?/8) colors. So, this result shows that 
a performance that seems only somewhat better in approximating independent sets implies 


being able to do quite significantly better for approximate graph coloring. 


9.1 Additional definitions and previous results 


Given a maximization (minimization) problem, we say an algorithm is a polynomial-time 
approzimation scheme (PTAS) if for any constant € > 0, it runs in probabilistic polynomial 
time and finds a solution of value within a (1+ €) factor of the maximum (minimum). For 


example, consider the problem MAX 2-SAT of finding a solution that maximizes the number 
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Where | Lower Bound | Assumptions 

FGLSS constant NP ¢ nleslosn)_ TIME 

FGLSS Q(legn)i-* NP ¢ n(°8")"-TIME 
BS A PTAS for MAX-SNP 


Here nie No (log n)-coloring algorithm for k-colorable graphs or 
no polylog(n)-coloring for (log n)-colorable graphs. 
Here n[2{elogn)'? No O(n‘)-coloring for k colorable graphs in 


quasi-polynomial (n'°6”)" ) time. 


Table 9.1: Lower bounds for Independent Set approximation based on various assump- 
tions. 


of satisfied clauses in a 2-CNF expression. A PTAS for this problem would be an algorithm 
that for any € > 0, given a sufficiently large 2-CNF expression, finds an assignment that 


satisfies 1/(1+ €) of the maximum number of clauses possible. 


MAX SNP is a syntactically defined class of problems described by Papadimitriou and 
Yannakakis [30]. It has the property that if one MAX SNP-hard or MAX SNP-complete 
problem has a polynomial-time approximation scheme then all problems in MAX SNP do 
as well. Some MAX SNP-complete problems include MAX k-SAT for k > 2 (finding the 
maximum number of clauses satisfiable in a K-CNF formula), the problem Independent Set- 
B of finding the largest independent set in a graph of constant degree bound B > 4, the 
TSP with edge weights 1 and 2, and others [30]. It is believed for these problems that no 


polynomial-time approximation schemes exist. 


Berman and Schnitger prove that if there do not exist polynomial-time approximation 
schemes for MAX SNP-hard problems, then for some constant € > 0, no polynomial-time 
algorithm approximates Independent Set with performance guarantee n‘. In a recent result 
of a very different style, Feige, Goldwasser, Lovasz, Safra, and Szegedy [19] prove a lower 
bound for approximating independent sets based on NP not containing “quasi-polynomial 
time”. In particular, they show that there is no polynomial-time algorithm for independent 
set with performance guarantee O(2te8n)*-") for any ¢ > 0, if NP Z U, [nttos™)* TIME : 
In addition, they show there is no algorithm with constant performance guarantee for In- 
dependent Set if NP Z n?(°8!°8")_ TIME. Thus, they get a weaker conclusion than Berman 
and Schnitger (since geen)" < nF for all €,€ > 0) but based on likely a much more solid 
assumption. The new results presented in this chapter go in the other direction, proving 
a much stronger conclusion, but based on what may be much less solid assumptions. The 


results are summarized in Table 9.1. 
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9.1.1 The basic idea of the new results 


Berman and Schnitger prove their result on approximating independent sets by amplifying 
“approximation gaps” in a constraint satisfaction problem MAX;,,. They then show this 
problem can be reduced in an approximation-preserving sense to Independent Set, and 
reduced from, in such a sense, MAX SNP-Complete problem MAX 2SAT (Lemma 4.6 of 
[7]). Following the chain of reductions yields their n‘ bound. More simply, however, we 
can apply their basic technique in a straightforward way directly to the independent set 
problem. Doing so allows us to relate the approximability of Independent Set not only to 
that of finding PTAS’s for MAX SNP-hard problems (in this case, the problem Independent 
Set-B), but also to the problem of finding good approximations for graph coloring. In fact, 
this version of their procedure (in some ways more general, in some more specific than the 
procedure in [7]) can be thought of as a randomized version of a commonly used graph 


product, and we describe the procedure from this point of view. 


9.2 Randomized graph products 


We now describe the randomized graph product technique that will be the key to the results 
presented in this chapter. The technique is formalized in the procedure Rand-Select below. 

Algorithm Rand-Select takes as input an n-vertex graph G and values r,p, and ¢t, and 
produces as output a new N-vertex graph H. The purpose of this procedure is to amplify 
gaps in independent set approximation. In particular, the procedure will reduce a problem 
of finding an independent set of size n/t? in an n-vertex graph containing an (unknown) 
independent set of size n/t, to a problem of finding an independent set of size N/(n")? in 
an N-vertex graph containing an independent set of size N/n", where N = n’?+?, Thus, 
for example, if the original graph was 3-colorable and so contained an independent set of 
size n/3, then the problem of finding an independent set of size n/9 in the original graph (a 
factor of 3 smaller) is mapped to a problem of finding an independent set a factor of 1/n” 
smaller than the largest independent set in the new n?"*+?-vertex graph. We now describe 


the procedure. 
Algorithm Rand-Select (Variant of procedure in proof of Lemma 4.3 in [7]) 
Given: An n-vertex graph G = (V, EF) and values r, p, and t. 
Output: An n’?*?-vertex graph H, and a mapping » from subsets of G to vertices 


of H. 


1. Select N = n’?+? subsets of vertices, each of size rlog,n, at random from the 


vertices of G. Label the subsets s,, 52,...,Sn- 
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Figure 9.1: A sample mapping from sets s; in G to vertices w; in H. 


2. For each subset s;, associate a vertex w; in H. The edge set E(H) = {(w;, w;) | 
8; Us; is not independent in G}. That is: (w;,w;) is not an edge in H only if 
both s; and s; are independent sets and in addition there are no edges between 
any vertex in s; and any vertex in 8;. 


Define a mapping y(s;) = w; and y~'(w;) = s;. (See Figure 9.1.) 


(Note that for this to be a polynomial-time procedure, we need the product rp bounded 
above by a constant. The value ¢ need not be a constant: in fact, we will later plug in 
t = logn to apply this technique to (log n)-colorable graphs.) 

Given a graph G and new graph A created using Rand-Select above, it will be convenient 
to extend the mapping ¢ as follows. For 5 C V(G), let y(S) = {y(s;) | 5; C S}. Also, for T 
a subset of V(H), define y~'(T) = {v | v € 5; for some ¢(s;) € T} = Uver py '(w). Notice 
that S D p-'(p(S)) and T C y(y-'(T)); we do not necessarily get equality in the first case 
since S may have elements not inside any s; C S, and in the second case, for w;,w; € T, 
the set s; Us; may contain some s, for w, ¢ T. From this extended definition of y and the 


definition of E(H) in step 2 of Rand-Select, we immediately get the following fact. 


Fact 9.1 If S is an independent set in G, then y(S) is an independent set in H. If T is 
an independent set in H of size at least 2, then p~\(T) is independent in G. 


Proof: If w;,w; € y(5), then s;,8; C S. So if the edge (w;,w;) is in H, then s; Us; is a 
non-independent subset of S. If T is independent in H of size greater than 1, then for each 
w; € T, the set s; must be independent in G. So, y7'(T) is a union of independent sets 


that are pairwise independent of each other, and thus is independent itself. = 


The purpose of procedure Rand-Select is as follows. Let H = Rand-Select(G,r, p,t). If 


we have an independent set S of size n/t in graph G, then since each s; has probability 
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about (4)"!°&” of being chosen inside S, the expected size of y(S'}) in H is O(n"?+?(7)"!08") 
= Q(n™?-+2). In fact, with high probability, y(5') will be about that large. However, the 
expected size of y(S’) for S’ an independent set of size n/t? is only O(n"?t?(5)" 8") = 
O(n?). In fact, it turns out that with high probability, y(S’) will be small for all such 5S’ 


(this is the purpose of the “+2” in Rand-Select) as described in the following theorem. ' 


Theorem 9.1 Let G be an n-vertex graph with an independent set S of size n/t and let H 
be the output of Rand-Select(G,r,p,t). Then, with high probability, if (rlog,n)? = o(n/t?) 
where p > 1, both of the following are true: 

(1) |y(S)| > 4n"P-Y*2, and 

(2) for every independent set S’ in G of size at most n/t”, we have |y(S')| < 4n?. 
Note that (1) implies |p(S)| = Q(N/n") and (2) implies |y(S")| = O(N/n"?), for N = 
IV(H)|. 


The proof of this theorem uses the following standard (Chernoff variant) probabilistic in- 


equality (e.g., see [1]). 


Fact 9.2 Suppose X,,...,Xm are mutually independent {0,1}-valued random variables. 
Let X =X, + Xo+...+ Xm and let p = E[X]. Then: 


€ u 
Pr[X > 2u] < (5) 
Pr[X < p/2] < e74/®. 


Proof of Theorem 9.1: First, claim (1) (the easy half). Given S C V(G) of size n/t, 
consider a run of algorithm Rand-Select. Let X,; be a random variable such that X; = 1 if 


8; C S$, and otherwise X; = 0. So, X = > X; equals |y(S)|. Since S has size n/t, we have: 


Papa d)) = 0965/2) 
(4) "(14 0(1)) (since (r log, n)? = o(n/t) ) 


n-"(1+ 0(1)). 


i} 


Thus, E[X] = n™’-)+*2(1 + o(1)) and with high probability, we have X = |y(5)| 
dnre-1)4+2, 


IV 


Now, claim (2): we show that for every small set 5’ (and thus every small independent 
set S$’), y(S") has size at most 4n?. We do this by showing that each individual set S’ has 


an eztremely low probability of having an image under ¢ larger than this value. 


>We can actually replace the “+2” with “41” to get a slightly better bound in Theorem 9.2 with only a 
little extra effort. However, the precise value of the constant is not crucial for us unless it could somehow 
be made significantly less than 1. 
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Fix a given set S’ in G of size n/t?. Let X/ = Lif 5; C S’, and Xj = 0 if s; Z S’, and 
let X’ = 52 X]. Again, since (rlog,n)? = o(n/t?), we have: 


Pr[Xj= 1] = (7H "(1 + o(1)) 
n-"?(1 + o(1)). 


Thus, 0.5n? < E[X’] < 2n?. Applying Fact 9.2, we get that Pr[X’ > 4n?] < (€/4)°>"’, 
which equals 2-°"” for some constant c. Note that if S’ has size less than n/t?, then 
Pr[X‘ > 4n?] can only be lower. Now the crucial point: the probability that X’ > 4n? is so 
small that even if we now sum over all at most 2” such sets S’, we get that Pr[|y(S’)| > 4n7] 
for any S’ of size at most n/t?, is no more than gng-en? — o(1). Thus, with high probability, 


both conclusions of Theorem 9.1 hold. m 


So, algorithm Rand-Select maps a problem of finding an independent set of size 1/t?~! 
times the largest in the original n-vertex graph to a problem of finding one of size 1/n’?-" = 
1/N (i 9s) times the largest in the new N-vertex graph. In particular, one gets the 


following theorem. 


Theorem 9.2 Suppose there exists a (randomized) algorithm A for Independent Set on 
N-verter graphs that runs in time f(N) and has performance guarantee < 1 yy (1-943) for 
constants r,p. Then, there is a randomized algorithm B that on n-verter graphs containing 
an independent set of size n/t, finds an independent set of size n/t? with high probability in 
time f(n?"+?) + O(n?+°)), so long as (rlog,n)? = o(n/t?). 


Proof: Given an n-vertex graph G with independent set S of size n/t, run algorithm 
Rand-Select(G,7,p,t) to create graph H on n’?+? vertices. This step takes O(n7?+0)) 
time. By Fact 9.1, we know that y(S) is independent in H, so by Theorem 9.1 claim (1), 
we have that with high probability H contains an independent set of size a daar So, 
algorithm A in time f(n’?+?) finds an independent set T in H of size at least: 
i ad 2 AntP-rt2 = ge 
7 [nrp+2](!~ 983) ees. 

Now, look at $’ = y~'(T) which is independent in G by Fact 9.1. We know, by definition 
of y, that y(S’) D T and so |y(S’)| > 4n?. Thus by Theorem 9.1 claim (2), we have: with 
high probability 5’ must have size at least n/t?. = 


If we plug p = 1+ € into Theorem 9.2 and let r = 2(1 — 6)/6 so ap =l- 6, then: 


nlSs) = niteyra3 > alsa) = Pipe dey 


So, if we view Theorem 9.2 in the contrapositive form, we get the following corollary. 
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Corollary 9.3 If for some t and some constant € > 0 there is no randomized polynomial- 
time algorithm which finds an independent set of size n/t“+® on n-verter graphs containing 
an independent set of size n/t, then: for any constant 6 > 0 there is no (randomized) 
polynomial-time algorithm with performance guarantee o(nt#“-*)) for general Independent 
Set. 


This corollary immediately implies the Berman-Schnitger result on the approximability 
of Independent Set [7] by using the MAX SNP-hard problem Independent Set-B. Any n- 
vertex graph with degree at most B must have an independent set of size n/(B +1). So, 


we get the following. 


Corollary 9.4 (Berman and Schnitger) If there do not exist randomized PTAS’s for 
MAX SNP-hard problems, then there exists c > 0 such that Independent Set does not have 


a (randomized) polynomial-time approximation algorithm with performance guarantee n°. 


In particular, if for some € and B, Independent Set-B does not have a randomized polynomial- 
time approximation algorithm with performance guarantee (1 + €), then Independent Set 
does not have a polynomial time approximation algorithm with performance guarantee 
o(ntt7(t-9)) for ¢ = logg,,(1+ €), for any constant 6 > 0. 

We can also use Theorem 9.2 to provide stronger bounds on independent-set approxi- 
mation based on assumptions of the hardness of approximate graph coloring. In particular, 


we can prove the following. 


Theorem 9.5 Suppose there exists a (randomized) polynomial-time algorithm A for In- 
dependent Set with performance guarantee n'-* for some € > 0. Then, there is a ran- 
domized polynomial-time algorithm B that will color any n-verter k-colorable graph with 
O(log n) colors, and color any n-verter O(log n)-colorable graph with O(log’ n)-colors (where 
ce<1+3/e). 


Theorem 9.6 Suppose there exists a (randomized) quasipolynomial-time algorithm A for 
Independent Set with performance guarantee N/2ve68N on N-vertex graphs, then there is 
a randomized quasipolynomial-time algorithm B to color any n-verter k-colorable graph with 
O(n‘) colors, where € = (20logk)/c. 


Proof of Theorems 9.5 and 9.6: Given an n-vertex t-colorable graph G, we know there 
exists an independent set of size at least n/t. Suppose we had an algorithm B’ that on any 
n-vertex graph with an independent set of size n/t were guaranteed to find an independent 
set of size n/t? for some constant p. We could then find a coloring of G with at most (t? Inn) 


colors by applying B’, coloring the independent set found with one color, and then repeating 
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on the remaining graph G’ of size at most n(1— 1/t?). Note that since G is t-colorable, 
so is graph G’, and thus G’ has an independent set of at least 1/t of its vertices as well 
and we may reapply B’. The number of colors used by this algorithm 8 is at most a value 
C such that n(1 — 1/#?)° = 1, so C < —(Inn)/In(1 - 1/t?) < t# Inn. Thus if ¢ is some 
constant k, the number of colors used is O(log n) and if t = logn, the number of colors used 
is O((logn)?*!). (The fact that log n decreases as the graph gets smaller only helps). 

If there exists a polynomial time algorithm A for Independent Set with performance 
guarantee n}~‘ for some constant € > 0, then for p > oa algorithm A has performance 
guarantee o(n~ 7433). So, we can apply Theorem 9.2 with r = 1 to get a randomized 
polynomial-time algorithm 8’ with the guarantee we need. This proves Theorem 9.5. 

For Theorem 9.6, we must be a bit careful.2 The quasi-polynomial time algorithm B 
is as follows. Given n-vertex k-colorable graph G, let p = clog, n so né = k?, and let 


N = n?+?, Plugging in these values, we get: 


av clog N om NVelV (pt2)logn 
NV ¢/(2p log n) 


2 
= N Vee] (2p? log ) (using logn = (plogk)/e) 
—- n@ (using ¢ = 72128") 


Thus, the performance guarantee N/ aVeloe® of algorithm A for Independent Set on N- 
vertex graphs is o(N/N*7) = o( N1—#43). So, we can again apply Theorem 9.2 with r = 1 
to get an algorithm B guaranteed on any k-colorable graph to find an independent set of 
size n/k? = n}-*. Thus, B makes progress (in fact, progress type 1 of Section 3.3) towards 
an O(n‘)-coloring of G. 

Since algorithm A is quasipolynomial, algorithm 6 runs in time quasipolynomial in 


O(log n) 


(n?*?), which is quasipolynomial in n since n?+? =n " 


?Note: it is easy to fall into a trap in Theorem 9.2 in falsely thinking that if p is a function of n (eg. 
p = €logn) for algorithm B, then we can plug in the same function of N (eg. €log NV) for algorithm A. 


Chapter 10 


Possibilities for improvement, open problems, and 
conclusion 


10.1 Possibilities for improvement 


Algorithm First-Approx performs most poorly when (1) many vertices share about n°? neigh- 
bors in common, and (2) the average vertex degree is about n°*. If the edges in the graph 
were distributed randomly, this combination of events would likely not occur since for such 
a low average degree, any two given vertices would be expected to share less than one 
neighbor in common. Instead, the graph must contain high density regions. For example, 
a graph could have properties (1) and (2) above if it consists of a collection of “clusters” 
of size O(n®*) such that each vertex inside a given cluster has O(n°*) neighbors within 
the cluster and O(n°*) neighbors distributed throughout the other clusters. Thus, if the 
edges within a cluster a distributed randomly, then 2 vertices inside the same cluster share 
on average O((n°*)?/n°®) = O(n°?) neighbors in common, even though the degrees are 
low. (The purpose of giving to each vertex @(n°*) neighbors in the other clusters is so 
that the distance-2 neighbor set N(N(v)) for each vertex v may have size 2(n°%) to avoid 


immediately making progress through Corollary 3.2.) 


Algorithm Improved-Approx achieves better performance by taking advantage of such 
high density regions when they are found. However, one other possible approach is the 
following. Suppose by removing 9/10 of the edges in the graph, one could somehow get 
rid of such high-density regions and prove a stronger analog of Theorem 4.1 (bounding the 
number of shared neighbors of two vertices). Then, Theorems 4.5 and 4.6 would still apply 
to show that some set T = N,(N(v)MJ;) in the new graph is both large and has a large 
fraction of its vertices red. The main point here is that even though an independent set in 
the new graph might not be an independent set in the original graph, there still must be 
some color class in a 3-coloring of the original graph that satisfies the A = 1/2 condition (see 
Theorem 4.5) in the new graph. Also, the average degree has only changed by a constant 
factor, so the set T produced will still be large. One small difficulty is that Corollary 4.7 


relies on a large minimum degree which might no longer exist in the new graph. This 


94 


10.2. Open problems and conclusion 95 


problem can be overcome by simply deleting all vertices with degree less than, say, 1/10 of 
the average in the new graph. 

A different way one might be able to do significantly better is to consider distance-3 
neighborhoods of vertices (or perhaps even distance-t neighborhoods for larger t). From 
preliminary calculations, I believe that some of the results for distance-2 neighborhoods 
may go through — for example, that one could find a set J’ with an independent set of 3/8 
of its vertices inside the distance-3 neighborhood. (Note that if the edges were distributed 
randomly, one would expect a ratio of 2:4:2 of blues to reds to greens inside the distance-3 
neighbors of v for v € red.) | However, all the techniques given here for forcing expansion 


— that is, for forcing the set found to be large —- seem to break down completely. 


10.2 Open problems and conclusion 


We have described here an algorithm guaranteed to color any 3-chromatic graph with 
O(n3/8) colors in the worst case, and shown how these techniques can be used to improve 
previous bounds for coloring k-chromatic graphs for k > 3 as well. Clearly, however, there 
remains a long way to go. There is no reason to believe an O(n?/8) bound is intrinsic to the 
coloring problem. In fact, for coloring 3-colorable graphs, to date there is no lower bound 
known greater than 3. That is, it remains unknown whether there is any intrinsic reason 
why one could not 4-color any given 3-colorable graph in polynomial time. It would be a 
very significant contribution to this area if one could make headway in this direction. For 
the general problem of coloring graphs of arbitrary chromatic number, the best lower bound 
remains a factor of 2 — € from 1976 by Garey and Johnson [20]. 

The random and “semi-random” case appears much easier. We have described here an 
algorithm to color a random k-colorable graph in the model G(n, p,k) for p as low as n7't* 
(see Section 7). For even smaller values of p, perhaps some other strategy might work well. 
An intriguing open question is whether there might be a polynomial-time algorithm to color 
graphs in G(n, p,k) for every p, or whether there is some intrinsic reason such an algorithm 
should not exist. Experimental work of Petford and Welsh [31] suggests that at least for 
the heuristics used there, low values of p for which the average degree in the graph is about 
5 or 6 may be the hardest. 

For the semi-random model we described in Chapter 8 an algorithm to color graphs in 
Gsp(n,p,3) for p as low as n~°%t*, and for higher values of p for k > 3 (see Table 8.1). 
One obvious open question is whether one can optimally color such graphs for lower noise 
rates p. A second open direction to explore is coloring graphs based on even “harder” semi- 
tandom sources that have been proposed and studied in the cryptographic literature. In 


these models, the “noise” is not independent over each bit; rather, we are simply guaranteed 
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that no sequence of bits of some length occurs too often. In a graph setting, this might 
correspond to a model in which we are simply guaranteed that for any given collection of 


’ no fixed configuration occurs with more than some specified probability. 


“potential edges,” 
The reader is referred to papers of Chor and Goldreich [17] and Zuckerman [44] for more 


details on these “weak random” models. 


Appendix A 


The Vertex-Cover / Independent-Set 
approximation algorithm 


We now describe a simplified version of the Vertex-Cover approximation algorithm of Bar- 
Yehuda and Even [4] and Monien and Speckenmeyer [28], specialized to its use in this thesis. 
The version here is taken from a treatment given by Boppana and Halldérsson [12]. We 


will describe the algorithm as an Independent Set approximation algorithm for the special 


case where the input n-vertex graph contains an independent set of at least $(1 - at of 


its vertices. The output of the procedure is an independent set of size Q(n/logn). 


Algorithm Approx-IS [Simplified version of the BE/MS algorithm] 


Given: An n-vertex graph G which has an independent set of size at least $(1 — 


1 
logn 


)n. 


Output: An independent set of size at least Q(n/logn). 


1. Remove all odd cycles of length < 2141 for] = 8" — 1. See Note 1 below. 
(Assume for simplicity that gn — } is an integer.) 

. Initialize I, the independent set found, to ¢. 

. Chooseve V. 


. Forie€ {0,...,/}, let V; = the set of vertices of distance i from v. 


oO WwW & bo 


» Forieé {0,..., 2}, let S; = V; UVj;-2 UVj-4gU.... 
Note that S; is an independent set since there are no odd cycles of length < 21+1. 
For example, if there were an edge between a vertex in V2 and a vertex in V4 


then there is a cycle of length 7. 
Also, note that N(S;) = Si41. 
6. Let ig < 1 be an index such that |N(5;,)| < n/“4|5,,|. 
This property must hold for some iy € {0,...,1} because otherwise: 
[N(S))| > nV@ADS] > nD] Sf > nD] 5, o] >... > nt DMD] Gy) = x, 


a contradiction. 
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7. Let 1 —IUS;, and let V — V — S;, — N(S;,). 
If V is non-empty, then go back to Step 3. Otherwise output set I. 


See note 2 below. 


Note 1: Step 1 removes all odd cycles of length < 21+ 1. An odd cycle of length 27:4 1 
may have at most i vertices in any independent set in G. So, if m vertices remain after 
this step (so n — m are removed), we have removed at most mei (” — m.) vertices from any 
independent set in G. Thus, the maximum independent set in G may have size at most 
m+ mea(” —m). This implies that the number of vertices m remaining is at least n/logn 


since otherwise, 


IA 


m + (n — m)(8" — 3)/( 28") 


m+(n— m) en 


1 3 
s re Gi eal re, 


m + (n—m)xhy 


I} 


n n 3n 

2 logn w 2log?n 
1 

2 


< 


(1- —)n. (for n sufficiently large) 
This contradicts our assumption on the largest independent set in G. 


Note 2: By Note 1, after Step 1 we know graph G has at least n/logn vertices. Each 
application of Step 6 removes from V at most O(n'/('+!)) times as many vertices as added 
to I. So, the final set J reported in Step 7 must be large enough so that |J|ni/(+)) = 
Q(n/logn). That is, it must be the case that: 


[7] = OG Ben MOD) = AGA nD), 


For |= SER - i, we have: 


I — (log logn 1) —. logn~3 logn-6 __ 6 
}1 ore = el 3 3 3) = aenae — Leh = L= logn 
So, finally, this implies that: 
= 1 1-,¢ 
|7| = Usa Tore ) 
_ —6 
_ 07 Gree *2 ) 


II 


Q(n/ log n). . 


Appendix B 
An analog of Spencer’s result on counting 
extensions 


In this section, we prove an analog of a theorem of Spencer for counting the number of 
images of a rooted graph. 

If (R, H) is a rooted graph (see Definitions 8.2, 8.3, and 8.4), define Im(H,G) to be 
the set of images of H in G and let Num(H,G) = |Im(H,G)|. Also, for M some model 
(such as G(n, p) or G(n, p, k)) define n(H, M) to be the expected number of images of H in 
GeM. 

Spencer [35] proves the following result for the random graph model G(n, p). 


Theorem B.1 (Spencer) Let (R,H) be strictly balanced on some constant number of 
vertices and let 6,c > 0. Then, 1K > 0 so that if p is such that p(H,G(n,p)) > K logn, 
then for G — G(n,p): 


Pr[(1—6)u < Num(H,G) < (14 6)p] =1-o0(n~*). 


In order to prove that the -path algorithm of Chapter 7 works as claimed, we need an 
analog of Spencer’s result — at least for the case of H a path of some constant length | — 
for the model G(n, p,k). (As noted in Chapter 7, paths of length / between two roots x and 
y are strictly balanced.) In fact, Spencer’s proof goes through in the G(n, p,k) model with 
only minor modifications. We describe here what those modifications are and how they 
affect Spencer’s proof. 

Spencer’s result is easiest to prove for the special (but main) case where there exists 
some sufficiently small € so that the expected number yp of images of H in G is at most n‘; 
that is, when K logn < p < n‘. To simplify our discussion, we will only consider that case 
here. We will also consider only rooted graphs (#, H) that have no automorphisms fixing 
the roots. Spencer counts “extensions” which are essentially all the different maps of H 
into G, whereas we count the images of H; for rooted graphs without such automorphisms, 
these are the same quantity. Note that paths of length / fit into this category. 

For H a path of length / between roots z and y, we would like to prove that the number 


of images of H in G — G(n,p,k) given that z and y are chosen the same color, or given 
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that z and y are chosen of different color, are both within (1+ 0(1)) of the expectation. In 
order to not prove essentially the same theorem twice — once for each case — let us define 
the notion of a random k-colorable graph given a particular root coloring. For a root set 


R, there are k!*! different possible ways to assign k colors to the |R| vertices. So: 


e Let GF(n,p,k) be the model G(n,p,k) given that the subset R of V has the jth of 


k/Rl possible colorings. 


B.1 Modifying Spencer’s result 


Suppose (R, H) is a rooted graph on a constant number of vertices and X is some image 
of H in K,. Let v = nonroots(H) and e = edges(H). Then Pr[X C G|G+«< G(n,p)] = 
p®. If H has no automorphisms, then p(H,G(n,p)) = n¥p*(1 — o(1)). The key fact that 
allows Spencer’s argument to go through for G(n, p,k) is that if H is also k-colorable, then 
Pr[X C G|G < Gi(n,p,k)] = O(p*). The reason is that since H has only a constant 
number of vertices, there is a constant probability at least (1/k)!¥(! that in the creation 
of G, the vertices of X are placed into color classes that legally color the graph. So, 
Pr[X C G] > (1/k)!% pe = O(p°). 


We now describe how to modify Spencer’s proof to prove the following result. 


Theorem B.2 Let (R,H) be strictly balanced on some constant number of vertices with 
no automorphisms fixing the roots, and let 6,c > 0. Then, there erists K,¢ > 0 so that if 
p= w(H,GPR(n,p, k)) € [K log n,n], then for G — GF(n, p,k): 


Pr{(l—6)u < Num(H,G) < (14+ 6)u] = 1-o(n~’). 


Proof: For convenience, let M = G?*(n,p,k), v = nonroots(H), e = edges(H), and let 
G — M. Also, let X,,...,Xm be the images of H in A, and let A; be the event that 
X; C G. We may assume that H is k-colorable given that the root set has the jth possible 
assignment of colors, else uz would equal 0. 

From our above observations, for any given image X;, Pr[X; C G] = O(p*) and so 
pp = O(np*). For convenience, let us define p so that Pr[X; C G] = (p)* and thus 
pH = (1 — o(1))nvpe. 

Spencer’s proof for G(n,p) where u(H,G(n, p)) € [K'logn, n*] for sufficiently large KH’ 
and sufficiently small ¢’, proceeds in three stages. First, he proves a theorem stated here as 
Theorem 8.2 of Chapter 8. We have already proven the analog in Theorem 8.3.! Second, 
he proves that for G’ — G(n,p), with probability 1 — o(n~‘), the size of every maximal 


‘Technically, Theorem 8.3 was proven for the semi-random model Gsp(n,p,k). However, since in 
GF (n, p,k) there are w.h.p. 9(n) vertices of each color class, the bound holds for this model as well. 
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family F of disjoint X; in G’ is within (1+ 6) of uw. Finally he shows that for any fixed 
maximal family F, with probability 1—o(n~‘) there are only O(1) images X; ¢ F in G’ that 
intersect some X; € F. Since every image of H must either belong to F or else intersect 
some X; € F (as F is a mazimal family of disjoint images), the last 2 parts of Spencer’s 
argument imply that Num(H,G’) is within (1 + 6) of » with probability 1 — o(n~°). 

Note that for X any subgraph of K,, at all, 


PrlX CG|G—M] < Prl[xX CG'|G’—Gin,p)]. 


The reason is simply that each edge is placed into G — M with probability at most p (either 
probability p or probability 0 depending on the colors of the endpoints), while in G(n, p), 
each edge is placed into the graph with probability exactly p. So, if we pick € sufficiently 
small such that u(H,G(n,p)) < n® and thus Spencer’s argument holds for G’ — G(n,p), 
then the third part of Spencer’s argument carries over directly and we need not prove it 
again here. (Recall that ~ = O(n’p*) so for any ¢’ > 0 there exists €« > 0 such that 
(pu <n‘) > (n%p® < n).) We focus now on the second part. The analysis here is taken 
directly the proof of Spencer [35]. 

Let us first calculate some basic quantities. First, the number of X; in A’, is at most n’ 
so we can loosely upper bound the number of families F C Im(H, K,,) of t pairwise disjoint 
images X,, by Gee Also, for any fixed such family F, the probability that F C Im(H,G) 
(that is, that the X; in F are all in G) is (p*)' since the X; € F are all disjoint so the 
corresponding events A; are mutually independent. 

For a given family F of t pairwise disjoint X;, we now upper bound the probability 
that no image X; disjoint from all X; € F exists in G; that is, the probability that F is 
a maximal family of disjoint images given that F C Im(H,G). Let X;,,...,X;, be all the 
images in Im(H, K,,) disjoint from F. We know that: 


r = (n-tv—|RI), 


since there are (n — tv —|R|) non-root vertices not inside F, and H has no automorphisms. 
By Theorem 8.3, for € sufficiently small, 5°;.; Pr[A; A Aj] = 0(1) where i ~ 7 if i # j and 
E(X;) U E(X;) # . So, certainly the summation restricted to just the 7,7 € {i,...,i,} 
equals o(1) as well. Thus, by Janson’s inequality, (noting that the “A” term is o(1)) we 


have: 


Pr[ A 4s) 


j=l 


i+ o(2)IT] Pr (4s, 


[1 + o(1)](1 — p*)”. (by definition of p) 


lI 


Given the above facts, we can upper bound the probability P, that there exists any 


maximal family F of ¢ pairwise disjoint X;’s within G by the quantity: 
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eae = [1— o(1))( (HCL pry 


Consider now two cases. First, suppose t > n?°. We may upper bound ("") by a < 


(Using 3 > 2.718... to avoid confusion with e.) So, since n”p* = O(n‘) we have: 


ae. < wry < zal 7 o(%)"" - o(n- (et), 


Thus, the probability there exists within G a maximal family F of any size t > n”* is at 
most o(n~°). 

The second case is t < n”‘. For € sufficiently small (at most 1/4) we have t < n'/? so: 
(n—tv—|R|)y = nv —O(tvn’-!) > n¥ — O(n-1/?). Thus, 


(1— peyote (1 py" — i ae 

= (1= py" ~ O(n) 
(1 pry" /(1— O(n'n™"?)) 
(1 — p°)""[1 + o(1)]. 


IA 


So, we can upper bound the probability P, by: 


P, 


IA 


(a+ o()(" ya - ay" 


IA 


+ o(yi(" aya ey" 


Thus, P; < (1+ 0(1))Pr{Y = ¢t] where Y has the binomial distribution B(n’,p*). Let p* = 
n¥ pe. We know for such a distribution, for any 6 > 0 we have Pr[|Y — p*| > Su") = o(n7°), 
so long as p* > K logn for sufficiently large K. Thus, the probability there exists any 
maximal family F of disjoint images X; of size not within 6p of jy, and so not within $y” of 
p*, is at most o(n~°). This finishes the second part of Spencer’s argument. Since as noted 
above, the third part follows immediately from the result for G(n,p), we have proved the 


theorem. gs 
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